The Web-Book structure explained
All of our web books are formatted using HTML and a style sheet. The HTML takes care of the structure — the division of a book into chapter, section, paragraph etc. — while the style sheet takes care of the presentation1 — how each structural component should look on your screen or printer.
Books typically consist of a number of parts, and these are pretty well described by the Chicago Manual of Style and similar documents, which we follow fairly closely. Essentially, a book is divided into three parts, called the frontmatter, body and backmatter, and these in turn consist of zero or more parts, as follows2:
In this collection, the above structure is coded in HTML using the DIV tag along with a "class" attribute named after the part which the text enclosed by the DIV represents. For example, a preface would be represented as:
<div class="preface"> . . . </div>
Note that a class attribute is intended (in HTML) to refer to a style defined in a style sheet. In our case, this is often not so, and there is in most cases no style associated with the above-mentioned class names. They are used merely to describe structure, rather than presentation.
In XML, one would simply define a tag as <PREFACE> and use that, but you can't define your own tags in HTML.
In many cases, there will be multiple occurrences of a division — for example, the great majority of novels contain multiple chapters — and these are distinguished with the "id" attribute.
<div id="chapter1" class="chapter"> . . . </div> <div id="chapter2" class="chapter"> . . . </div> . . .
Headings are encoded using either H3 or H4 tags. Major headings — the heading for one of the divisions defined above, such as the heading for a preface or a chapter title — are encoded using H3. Minor headings — sub-headings or section headings within a chapter — are encoded using H4. Rarely, I've also used H5 as a sub-sub-heading.
So, for example, the start of chapter 1 of Richard Burton's Pilgrimmage to Al-Medinah and Meccah is encoded like this:
<div id="chapter1" class="chapter" title="CHAPTER I."> <h3>CHAPTER I.</h3> <h4>TO ALEXANDRIA.</h4> <p>A few words concerning . . .
Sometimes, I've used H2 for a "super-heading", where a chapter is the first of a major section, usually labelled "Book" or "Volume" or "Part". Such headings are omitted if they were merely an artefact of the printed work, e.g. where the work was printed as two volumes and the chapter numbering was continuous across the two volumes. If numbering of chapters began again from 1 with the second volume, then I've retained the "Volume" header rather than renumbering chapters, in order to avoid confusion with references.
Other structural components
The most used component of any book is the paragraph, which is simply encoded with the P tag. Mostly the P tag is used without attributes [although I am currently investigating the use of paragraph numbering using the id attribute, which would allow referencing text to the paragraph level].
Various other components are catered for using class attributes, which can be applied either to a single P tag, or to a DIV tag where the component has multiple paragraphs. The most common of these is the "quote", which is not text in quotation marks, but text which is quoted within the text; this is essentially the same as the blockquote. Similar classes exist for verse, the parts of plays, footnotes etc.
As a deliberate policy, our web books use only a limited subset of the full range of HTML encoding. In particular, presentation tags such as B (bold), U (underline) and I (italic) are not used; tags relating to forms and frames are not used; other deprecated tags are not used. Also, although BLOCKQUOTE is used, current practice is to replace this with a <div class="quote"> block.
A DTD will be developed to define exactly what is used.
All of our web books should conform to strict HTML 4.013
2 This list should be regarded as "flexible" — the list is not definitive and additional classes might be added as deemed appropriate at the time.