The EPUB ANT task

The epub Ant task is used to assemble an EPUB file from a given set of source items. It is assumes that most of the source material such as HTML-files and illustrations has been prepared beforehand. Ant version 1.7 and newer is supported.

The following is an approximate DTD for the task:


<!ELEMENT epub (identifier | type | subject | reference 
  | creator | meta | publisher | source | language 
  | rights | contributor | format | cover | 
	toc | item | date | title | fileset)*>
<!ATTLIST epub
          id ID #IMPLIED
          taskname CDATA #IMPLIED
          identifierid CDATA #IMPLIED
          file CDATA #IMPLIED
          description CDATA #IMPLIED
          workingfolder CDATA #IMPLIED
          includeReferenced %boolean; #IMPLIED>

Note that only XHTML items directly referenced from a file added to the manifest will be automatically included when the includeReferenced option is used. Generated XHTML files, such as the cover page will not be searched for additional content. This mechanism can be used to automatically add image files and such.

Adding “header” information

Certain elements are required in the header of the publication. These include the title of the publication, the identifier and the language code. It is possible to add more than one element of some types.

The following elements can be used:

id Required Description
titleyesThe publication title
identifieryesThe publication identifier
languageyesThe publication language
publishernoName of the publisher
subjectnoSubject of the publication
creatornoOne or more creators
contributornoOne or more contributors
datenoOne or more dates
covernoThe cover page

Publication title

Typically, the title will be a name by which the resource is formally known.

<!ELEMENT title (#PCDATA)>
<!ATTLIST title
          id ID #IMPLIED
          lang CDATA #IMPLIED>

Publication identifiers

The recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. These include but are not limited to the Uniform Resource Identifier, the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

<!ELEMENT identifier (#PCDATA)>
<!ATTLIST identifier
          id ID #IMPLIED
          scheme CDATA #IMPLIED>

If an identifier is not specified, one will be generated based using a random UUID. However it is probably a good idea to specify an identifier. Reading systems may use this field as intended and replace older versions of the publication when a newer is added to the library. A new identifier will be generated for each run of the script unless specified.

Language specification

The recommended best practice is to use RFC 3066 which, in conjunction with ISO639, defines two- and three-letter primary language tags with optional subtags. Examples include “no” for Norwegian, “en” for English", and “en-GB” for English used in the United Kingdom.

<!ELEMENT language EMPTY>
<!ATTLIST language
          id ID #IMPLIED
          code CDATA #IMPLIED>

If a language is not specified it will be set to “en” for generic English.

Publisher

<!ELEMENT publisher (#PCDATA)>
<!ATTLIST publisher
          id ID #IMPLIED
          lang CDATA #IMPLIED>

Publication subject

The subject will typically be represented using keywords, key phrases, or classification codes.

<!ELEMENT subject (#PCDATA)>
<!ATTLIST subject
          id ID #IMPLIED
          lang CDATA #IMPLIED>

Contributors and Creators

Examples of a contributor and creator include a person, an organization, or a service.

<!ELEMENT contributor EMPTY>
<!ATTLIST contributor
          id ID #IMPLIED
          fileAs CDATA #IMPLIED
          name CDATA #IMPLIED
          lang CDATA #IMPLIED
          role CDATA #IMPLIED>

<!ELEMENT creator EMPTY>
<!ATTLIST creator
          id ID #IMPLIED
          fileAs CDATA #IMPLIED
          name CDATA #IMPLIED
          lang CDATA #IMPLIED
          role CDATA #IMPLIED>

Optionally one can specify fileas to indicate a formal way of filing the entry. For instance “Last name, first name”.

<creator name="Nomen Nescio" file-as="Nescio, Nomen" 
  role="aut/>

This tooling will automatically add “Eclipse Committers and Contributors” in the redactor role.

In role MARC relator codes are used for indicating the role of the entity. The complete list is quite long. Some of the more typical are:

Name Code Description
ArtistartUse for a person (e.g., a painter) who conceives, and perhaps also implements, an original graphic design or work of art, if specific codes (e.g., [egr], [etr]) are not desired. For book illustrators, prefer Illustrator [ill].
AuthorautUse for a person or corporate body chiefly responsible for the intellectual or artistic content of a work. This term may also be used when more than one person or body bears such responsibility.
Author in quotations or text extractsaqtUse for a person whose work is largely quoted or extracted in a works to which he or she did not contribute directly. Such quotations are found particularly in exhibition catalogs, collections of photographs, etc.
Author of afterword, colophon, etc.aftUse for a person or corporate body responsible for an afterword, postface, colophon, etc. but who is not the chief author of a work.
Author of introduction, etc.auiUse for a person or corporate body responsible for an introduction, preface, foreword, or other critical matter, but who is not the chief author.
CollaboratorclbUse for a person or corporate body that takes a limited part in the elaboration of a work of another author or that brings complements (e.g., appendices, notes) to the work of another author.
CompilercomUse for a person who produces a work or publication by selecting and putting together material from the works of various persons or bodies.
EditoredtUse for a person who prepares for publication a work not primarily his/her own, such as by elucidating text, adding introductory or other critical matter, or technically directing an editorial staff.
IllustratorillUse for the person who conceives, and perhaps also implements, a design or illustration, usually to accompany a written text.
PhotographerphtUse for the person or organization responsible for taking photographs, whether they are used in their original form or as reproductions.
RedactorredUse for a person who writes or develops the framework for an item without being intellectually responsible for its content.
ReviewerrevUse for a person or corporate body responsible for the review of book, motion picture, performance, etc.

Dates

<!ELEMENT date EMPTY>
<!ATTLIST date
          id ID #IMPLIED
          date CDATA #IMPLIED
          event CDATA #IMPLIED>
          

Date of publication, in the format defined by Date and Time Formats and by ISO 8601 on which it is based. In particular, dates without times are represented in the form YYYY[-MM[-DD]]: a required 4-digit year, an optional 2-digit month, and if the month is given, an optional 2-digit day of month.

You may also set the event attribute. Legal values are not defined but may include “creation”, “publication” and “modification”.

The epub task will always add a “creation” date using the current date when assembling the epub file.

Types

Type includes terms describing general categories, functions, genres, or aggregation levels for content. The advised best practice is to select a value from a controlled vocabulary. To describe the physical or digital manifestation of the resource, use the format element. There should normally be no need to specify either.

<!ELEMENT type (#PCDATA)>
<!ATTLIST type
          id ID #IMPLIED>

Formats

Use to specify the format of the publication. Typically this is the MIME type or the software, hardware, or other equipment needed. The epub task will always set the format to “application/epub+zip” unless a different format is specified.

<!ELEMENT format (#PCDATA)>
<!ATTLIST format
          id ID #IMPLIED
          lang CDATA #IMPLIED>

Source

The publication may be derived from another resource in whole or part. The referenced resource should be identified by means of a string or number conforming to a formal identification system. If the publication is built from a web site it would be a good idea to use the URL of the entry page.

<!ELEMENT source (#PCDATA)>
<!ATTLIST source
          id ID #IMPLIED
          lang CDATA #IMPLIED>

Rights

A statement about rights, or a reference to one. In this specification, the copyright notice and any further rights description should appear directly.
This specification does not address the manner in which a Content Provider specifies to a secure distributor any licensing terms under which readership rights or copies of the content could be sold.

Typically, Rights will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions may be made about any rights held in or over the resource.

<!ELEMENT rights (#PCDATA)>
<!ATTLIST rights
          id ID #IMPLIED
          lang CDATA #IMPLIED>
          

Coverage

The extent or scope of the publication’s content. The advised best practice is to select a value from a controlled vocabulary; see the Dublin Core Metadata Element Set (http://dublincore.org/documents/2004/12/20/dces/).

Typically, Coverage will include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity).

<!ELEMENT coverage (#PCDATA)>
<!ATTLIST coverage
          id ID #IMPLIED
          lang CDATA #IMPLIED>


Relation

Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

<!ELEMENT relation (#PCDATA)>
<!ATTLIST relation
          id ID #IMPLIED
          lang CDATA #IMPLIED>

Meta

This type is used to express arbitrary metadata beyond the data described by the Dublin Core specification.

<!ELEMENT meta EMPTY>
<!ATTLIST meta
          id ID #IMPLIED
          name CDATA #IMPLIED
          content CDATA #IMPLIED>

Cover

<!ELEMENT cover EMPTY>
<!ATTLIST cover
          image CDATA #IMPLIED
          title CDATA #IMPLIED>

Adds a cover page using the supplied image file. Use a PNG, SVG or JPEG formatted file. When supplying raster images a dimension of 760x1000 pixels is typical.

Adding content

No publication is complete without content. So you will have to add at a minimum one chapter.

Primary content files

Content is added using the item element. At a minimum you will have to specify the file attribute. This points to the file that will be added to the spine. The spine is a structure within the publication that defines the reading order. So the order you add items does matter. If you’re adding other types of files such as cascading style sheets you will have to specify the type and whether or not to add it to the spine.

<!ELEMENT item EMPTY>
<!ATTLIST item
          id ID #IMPLIED
          file CDATA #IMPLIED
          type CDATA #IMPLIED
          page CDATA #IMPLIED
          spine %boolean; #IMPLIED>

Secondary content files

Files that are not required to be in the spine and which MIME-type can be automatically determined may be added to the publication using a nested fileset. This is identical to the fileset element type found in ANT except that you may add a extra dest and lang attributes. The new attribute can be used to specify the destination sub-folder of the files. If you for instance have illustrations in the form of JPEG, PNG or GIF images; this is the most straightforward to add these.

An identifier will automatically be created for each file added. It is on the form <mimetype>-<filename>. So a JPEG file named my_house.jpg will be identified as image-my_house. If you have another file named my_house.gif you will get a conflict so it would be wise to rename one of the files or add both using the item element and specify the identifier.

An example of use is shown below:

<fileset dir="${srcdir}" dest="images/" lang="en">
	<include name="*.gif" />
	<include name="*.png" />
	<include name="*.jpg" />
	<include name="*.otf" />
</fileset>

References

The guide allows you to specify the role of each file in the publication. While not required it is recommended that this feature is used. It is basically a list of references to each of the content files and the role they play. Some reading systems will for instance open a fresh book on the first page that contains text.

<!ELEMENT reference EMPTY>
<!ATTLIST reference
          id ID #IMPLIED
          href CDATA #IMPLIED
          type CDATA #IMPLIED
          title CDATA #IMPLIED>

<reference href="cover.html"
  type="cover" title="Cover Page" />
<reference href="title-page.html" 
  type="title-page" title="Building EPUBs" />
<reference href="introduction.html"
  type="preface" title="Introduction" />

The following types are allowed:

Type Description
coverThe book cover.
title-pagePage with title, author, publisher, and other metadata.
tocTable of contents.
indexBack-of-book style index
glossaryAn alphabetical list of terms used in the publication with definitions or explanations.
acknowledgementsStatement acknowledging use of works of other authors.
bibliographyA list of books or other material on a subject.
colophonA publisher’s emblem on a book.
copyright-pageSubject to or controlled by copyright.
dedicationAddress or inscription to a person, cause, etc as a token of affection or respect.
epigraphA quotation at the beginning of a book, chapter, etc, suggesting its theme.
forewordA phrase or passage from a book, poem, play, etc, remembered and spoken, esp to illustrate succinctly or support a point or an argument.
loiA list of illustrations.
lotA list of tables.
notesA brief summary or record in writing.
prefaceA statement written as an introduction to the publication, typically explaining its scope, intention, method, etc; foreword.
textFirst “real” page of content (e.g. “Chapter 1”).

Table of contents

<!ELEMENT toc EMPTY>
<!ATTLIST toc
          id ID #IMPLIED
          generate %boolean; #IMPLIED
          file CDATA #IMPLIED>

Exactly one toc element is used to declare a table of contents. There are two ways of doing this. Either by pointing to a readily prepared ncx file using the file attribute or by setting generate to true. This will iterate through all the elements in the spine and figure out the table of contents based on the header elements.

If the file attribute is used the task will automatically do the house-keeping. The file will be copied into the OEPBS folder of the publication, it will be placed first in the content declaration and properly referenced.

If this element is not used – a table of contents will still be generated in order to satisfy EPUB requirements. However it will be empty.