The Design and Construction of
eBooks


Steve Thomas

Original publication for eBooks@Adelaide.

Rendered into HTML by Steve Thomas.

Last updated Monday, August 24, 2015 at 16:05.

To the best of our knowledge, the text of this
work is in the “Public Domain” in Australia.
HOWEVER, copyright law varies in other countries, and the work may still be under copyright in the country from which you are accessing this website. It is your responsibility to check the applicable copyright laws in your country before downloading this work.

eBooks@Adelaide
The University of Adelaide Library
University of Adelaide
South Australia 5005

Table of Contents

Preface

I’ve been creating ebooks since 1998, building the web site eBooks @ Adelaide from a proof-of-concept single text into today’s popular collection of nearly 3,000 works. In that time — almost 15 years as I write — I have learned a great deal. I blush now to think of the earliest efforts, and today’s books are several orders more sophisticated in their design and construction than my first attempts.

In this short book I have attempted to distill everything I’ve learned along the way. It may encourage others to contribute to the world of ebooks: so much the better for the world. There is a great wealth of books out there waiting to be made available in accessible formats.

This book is in two parts. The first gives an introduction to the design and construction of ebooks, from the perspective of my experience with the eBooks @ Adelaide site. Design issues are largely subjective, therefore this will be a very personal view of what constitutes good ebook design.

The second part deals with the technical details of structure and presentation. It does contain large dollops of HTML and CSS code, and those with prior knowledge of those will doubtless derive more from it than the novice. But I believe that even those new to coding will find it sufficiently clear to follow. CSS in particular is highly readable.

I must acknowledge the support provided by the University of Adelaide Library, in providing server space for the collection and allowing use of its name. And of course I must also acknowledge the efforts of countless unnamed volunteers who have scanned, transcribed and corrected the raw text of the thousands of works available on the web. Without their work, neither this nor the majority of ebook sites could exist.

PART ONE

Introduction

A Brief History Lesson

I started making ebooks in 1998. I was aware of other e-text projects and had compiled a web page directory of these sites, but I was dissatisfied with their presentation. Most of them used plain text for the books, which is utilitarian but not very inviting from a reader’s viewpoint.

Here’s the venerable Project Gutenberg, plain text version of David Copperfield, chapter 1:

gutenberg.org
Screen shot from gutenberg.org, taken 2011-12-30.

It’s not awful. Just . . . dull.

Some sites (actually, most) presented their works in ways which I, personally, found annoying -- and they still do: they use coloured backgrounds, the pages are festooned with advertisements, sidebars and other extraneous material, they use ugly fonts, too small fonts, etc. While the plain text sites made no attempt at readability, these sites seemed to actively attack it!

Here’s David again, from another site:

bibliomania.com
Screen shot from bibliomania.com, taken 2011-12-30.

This was taken recently, but has been unchanged since 1998. It replaces “dull” with “hideous”. And whereas the Project Gutenberg text gave us the entire book in one single file, this goes to the other extreme and breaks the book into many small segments.

And here’s a screen shot from one of the “better” sites, Bartleby.com:

bartleby.com
Screen shot from bartleby.com, taken 2011-12-30.

You have to go to the site to get the full horror of this, because that advertisement at the top of the screen is actually animated. So while you are reading, this thing is flashing and wobbling in your face.

Some sites don’t use HTML at all, and expect you to read PDF. That’s OK for printing, but I find PDF too inflexible. PDF is designed for printing, so the text is formatted as it will appear when printed, usually on A4 paper. Unless you have an A4 sized screen, this will not be helpful. And as with the printed page, PDF does not allow you to increase or decrease font size, or any other aspect of presentation. What you see is all there is.


[www.rocket-ebook.com
At the same time, the first ebook readers, notably the Rocket eBook, had appeared. While interesting, it occurred to me that we all had perfectly good screens on our desks which could be used for reading, if only the work was better formatted.*

* Most of the world seemed to disagree with me at the time, and to this day there is great resistance to the idea of reading a book on your computer, even though many of us do most of our work with such screens. Admittedly, screen quality has improved dramatically over the past decade, and today’s LCD screens are much sharper and clearer than the CRT screens of 1998.

With these things in mind, I set out to explore how one might present a book using HTML in such a way that it was as readable and enjoyable as a printed book. Having proved the concept, I then began adding titles and refining the format, and refinement continues to this day.

The first title publicly promoted was Dickens’s Our Mutual Friend, done in order to tie in with an ABC TV adaptation which was showing at the time.

A Digression

eBooks vs. Print

The Romance of Books

You would think that by now, the “ebook vs. print” debate would be over. Not with a win to one side or the other, but in a draw, with everybody recognising that there are merits in both formats. It’s not difficult to find pro and con lists on the Web (my personal favourite is that you can’t throw an ebook across the room in disgust!) so I won’t replicate them here. But it is worth exploring some differences.

The first of these derives from what I’ll call the Romance of Books, which boils down to the tangible nature of print books. People love (or claim to love) the feel of books, the smell of books, the sight of a wall of shelves filled with books. And all of these things may be true*. But none of this has anything to do with the content of the book, so it’s irrelevant to the discussion. The book as art object, yes, I get that, but mostly there could be 200 blank pages between the oh-so-beautiful leather bound and hand tooled covers, and the attraction would still be there. (Granted, there are some books created to be art, such as the output of the Kelmscott press. But these are the exception.)

* Anyone who claims to love the smell of old leather bindings should be informed about the pure finders of old London, who earned a living collecting dog faeces for the tanners to use in making the leather for books.

Availability

In an age of near-ubiquitous network access, availability is one area in which ebooks have an overwhelming advantage. To someone who has spent many hours in bookshops hunting fruitlessly for a copy of a wanted book, the ability to simply locate and download to your ereader in minutes is like a miracle. Unless you live next door to a large bookshop, the closest you can get in the print world is ordering online; but you still have to wait days or weeks for delivery.

One pleasure that’s lost with ebooks is that of browsing in bookshops, something I still enjoy often. But that has more to do with the Romance of books than with their content.

Portability

It’s difficult now to imagine, but when William Collins introduced the Pocket Classics series, ca. 1906, it must have seemed revolutionary. Before then, most books would have been large, with solid, expensive bindings, the sort of thing available only to the rich or through libraries. The Pocket Classics, and then the Everyman series and later Penguin paperbacks, put the world of books within reach of, well, Everyman. Moreover, books for the first time became portable. They could easily fit into a coat pocket, and could be read one-handed while commuting.

The ebook responds to this demand for “read anywhere” convenience, and extends it to the point where one can carry a device with a whole library of books that still fits into a coat pocket.

Permanence

Another issue that frequently appears in these discussions is permanence. And yes, acid paper aside, print is certainly durable, with books printed 500 years ago still readable today. Against that we have the intangible nature of the digital book: dropped your ereader and didn’t have backup? Too bad, goodbye library! Also, a book purchased is yours to keep, for ever, while an ebook may turn out to have been merely “rented” from Amazon if they decide to withdraw the title1, or that you’ve transgressed one of their rules.2

1 Amazon Erases Orwell Books From Kindle, New York Times, July 17, 2009.

2 Amazon wipes customer’s Kindle and deletes account with no explanation, The Guardian, 22 October 2012.

On the other hand, a print library can be destroyed by fire or flood or vermin, while a collection of Amazon ebooks can be magically resurrected when your broken Kindle is replaced. Also, leaving aside questions of licensing (DRM) for the moment, any digital file, including ebooks, can be copied and recopied without loss, and stored in multiple places. With distributed web storage, there is now no reason to fear the loss of a collection. Files can always be recovered if there are copies.

Adaptability

Once I’ve created an ebook, as a single HTML file, that file can be easily adapted by conversion to other formats: I can split it into chapters for a multi-file ebook for the web; I can convert it into an ePub format for compliant ereader devices such as the Kobo, iPad and Nook; I can convert the same file into the proprietary Kindle format. Three different versions, all from the same file.

If I have a Penguin edition of David Copperfield, that’s all it will ever be. I cannot use it to produce a braille, audio or large print version. It is what it is: not very adaptable.

Environmental

There are obvious environmental arguments against print: the destruction of trees to make paper; the energy needed to transport books from printer to bookstore; and the costs of storage. By comparison, ebooks are clean and efficient: no materials are used to create them; they weigh nothing, and require no trucks or stores to make them available.

The truth is more complex: the environmental costs for ebooks are hidden -- electricty is required to transmit them and to present them on a screen; and the power costs of huge data centers such as Amazon’s is a subject of some concern. And on the plus side for print, it amounts to significant carbon sequestration over long periods.

Re-usability

No contest here: a print book, used with care, will still wear out, sooner or later, according to the quality of the paper and binding. Whereas an ebook can be read, copied, and distributed infinitely without loss.

Extensibility

Usability and readability issues will be explored in depth in the next chapter. Suffice to say here that while the design of print books has been refined over 500 years, and may therefore be expected to have reached a state of near perfection, that perfection also means that the format is now an obstacle to expanding functionality.

In contrast, the ebook format is giving us opportunities to extend the idea of the book in ways that are not possible with print. Two obvious examples are search, where the reader is now able to find specific parts of a work using a keyword search; and copy/paste, where the user can easily select portions of a text to copy into other documents. But we can also extend an ebook with the addition of multimedia and reference links. For example, an essay from Virginia Woolf is enhanced with the addition of an audio recording of her reading the essay; Cook’s Journal is enhanced with links from the text to Google Maps.

Size matters . . . only in print.

Working with ebooks, I realised that much of existing publishing practice was a direct response to the problems of print: large works needed to be bound as multiple volumes; small works (short stories, essays, etc.) needed to be bound with other items to make the exercise worthwhile. With an ebook, it doesn’t matter what size the work is, it can still be produced as an ebook: seven pages, or 7,000 pages, it’s the same result: a digital file.

That’s why, with the ebooks collection, I’ve been “unbundling” some collected editions into their component parts. E.g. the essays of George Orwell, which range far and wide over many topics, and included extracts from some of his novels. Some of his essays, e.g. Politics and the English Language, stand well on their own, in fact deserve to stand on their own. But at a mere eleven pages, you’ll never see an edition in print.

As for larger works, spread over several print volumes, that usually makes no sense in the ebook world. E.g. Richard Burton’s translation of The Thousand Nights and a Night, originally 16 volumes, including six supplements, is now a single ebook. In the original, stories spanned two separate volumes, so clearly the volumes were simply an artefact of printing.

Other artefacts of print include:

  • small type (= less paper);
  • plates and illustrations divorced from the text;
  • black and white illustrations where colour would be warranted, due to cost (colour images cost no more in ebooks than B&W);
  • footnotes, and (worse) endnotes;*

We’re only just beginning to discover ways in which ebooks can liberate us from the restrictions and limitations of print.

* To see how notes can be handled in ebooks, have a look at the Thousand Nights and a Night).

Design Principles

In progressing from an exploration to a real collection, I was aware of the need to set some standards and design principles.

The first was the choice of format. Obviously, the choice was HTML (and now XHTML), but there were alternatives available, and used for ebook collections at other sites: plain text, as used by Project Gutenberg; Adobe’s PDF (Portable Document Format); TeX, well established for technical papers; and SGML, a super-set of HTML. All of these formats have their advantages — for example, plain text is a clear winner in terms of accessibility. But HTML wins overall in being accessible and offering a great deal of flexibility in formatting, particularly when coupled with CSS (Cascading Style Sheets).

But a few more design principles were also required, and these are:

  • works should be readable on the screen as easily as on paper;
  • works should be unencumbered by advertising, logos, or other material not part of the actual book;
  • as far as possible, the ebook should provide the same features as a print book;
  • the reader should not unnecessarily constrained by my choices;
  • the reader should be able to download each work in its entirety for reading off-line.

In summary, I wanted to maximise the readability and usability of the ebooks, to make them “just like a real book”.

Legibility

The most important things to get right are legibility and readability. Legibility is straightforward: it refers to how easily one can recognise the glyphs that are used to form individual characters, so we need only to choose a typeface or font that is legible.

As an extreme example, compare

The quick brown fox jumped over the lazy dogs

with

The quick brown fox jumped over the lazy dogs

The first is Georgia, and the second Unifraktur Maguntia. To the extent that we find most legible that with which we are most familiar, the first is much to be preferred.

When I began the project, and until recently, choice of font was also constrained by the likely availability of different fonts on different platforms (operating systems). For example, Palatino is an attractive font, but it’s not generally provided on Windows, so it would be pointless choosing that font for our books. It was essential to choose a font that all or at least most users would have on their systems. Which led me to choose Georgia for the serif font, and Verdana for the sans-serif font. Happily, these are both attractive, highly legible fonts, both on the screen and when printed.

Recently, so-called Web fonts have become available and supported in all the major browsers. The fraktur font above is a web font, freely available from Google. Web fonts are used by including a little code in the header of the web page, plus some style sheet magic, making them available to all users without the user needing to do anything. The result of this is that we now have a much greater choice of fonts available to us when creating ebooks, as long as they are read through a browser. When transferring an ebook to an ereader (Kindle, iPad, etc.) any custom fonts are likely to be stripped out, and the text reduced to the ereader’s default.

Readability

Readability concerns how easily we recognise whole words, sentences, paragraphs, etc. Subjectively, we are looking for harmony and balance between the micro and macro elements of the book. After 500+ years of printing and publishing, we have some pretty clear ideas of what works and what should be avoided. Some key considerations here are: size, measure, leading, alignment, white space, and style.

  1. Size matters. Too small, and reading becomes uncomfortable, or the text illegible. Too large in comparison to the surrounding text, and it becomes a distraction.

    A crucial factor with size is the eye-sight of the reader. I can comfortably read 12pt text (with my reading glasses), but I know nothing about other potential readers. The only way to accommodate all readers, is to allow the reader to determine their own base font size, using their browser or ereader settings.

    The important thing then becomes the size relationship between different elements of the book: assuming the paragraph is 1em, then every other element (heading, footnote, etc.) is a proportion of that. So a heading size may be defined as 1.3em (or 130%), a footnote as 0.8em, and so on. By defining our sizes in terms of proportions rather than fixed point sizes, the ebook becomes scaleable: the reader can enlarge or reduce the text as they please, without losing the subtleties of proportion.

    An em is a unit of measurement equal to the currently specified point size. Historically, the unit was derived from the width of the capital “M” in the given typeface, but this no longer applies.

  2. Measure refers to the length of lines, and it has long been established that lines that are too long make reading more difficult. The eye has to scan rapidly from the end of one line to the start of the next, and the longer the line the longer it takes for the eye to do that, and the more likely it is that the eye will return to the wrong line.

    There is no hard and fast rule about measure, but a line length of between 45 and 75 characters is considered ideal, noting that in a proportional font, the characters have different widths. I have set a maximum width of 33em for body text. After much experimentation, this seems to be about right, and produces a similar result to many printed works.

  3. Leading refers to the spacing between lines (actually the height of the line from the baseline to the baseline of the line above). Without leading, text will appear cramped. Too much and the eyes will have to work too hard to scan the text. Here, leading is set at 140%.

  4. Alignment refers to how the lines are positioned relative to the margins: left, right, centered or justified.

    I have chosen to justify paragraph text, that is to have the spacing of each line adjusted so that all lines are aligned at both the left and right margins. There is no general agreement about how this affects readability. Some argue that the extra spacing that’s inserted to achieve this can be a distraction for the eye, therefore reducing comprehension. This can be true in extreme cases, but in my opinion the benefit or more readily identified blocks of text is worth the cost.

  5. Style refers here not to the style of the writer (humorous, serious, boring, etc.) but to the use of italic, bold and small caps, all useful in directing the reader’s attention to some distinguishing aspect of this portion of text which differs from the text around it.

  6. White space, above all else, is crucial to readability. Just as a sentence would be hard to decipher without the space between words, so to vertical space between elements of the page, margins, and indentation all assist the reader in discriminating between paragraphs, table elements, sections and so forth.

Usability issues

Attempting to replicate the print world on a computer screen presents a few challenges, as well as imposing a few constraints.

The Screen

The first problem is the width of the screen. Print books are generally taller than they are wide, while computer screens are generally the opposite (landscape). If the text was not to be spread across the entire width of the window, it had to be constrained. Initially, I used a table with a defined width, but this crude method was quickly superceded by the use of CSS, as noted above (Measure).

The second problem arises from the solution to the first. Having constrained the width of the page, the user is left with a lot of blank space on either side. On my screen, the page takes up about one third of the screen, leaving two thirds blank. By default, this blank space is white, which can create a significant problem of glare for the user, and screen glare is the main reason for users objecting to reading on the screen: it leads quickly to tired eyes or worse. The solution to this problem is to dim down the glare. Initially, I did this by changing the background color to gray, but later replaced the gray with a more interesting but still unobtrusive pattern.

But the glare problem persisted with the page being black text on a white background. Many people still find the white background to be too bright. So I’ve recently changed the style sheet to use an off-white background which reduces the glare. [The color is actually slightly yellow, to mimic old paper.]

 

Other usability issues are less easy or impossible to resolve:

The Page

The print world has been splitting a text into pages since the invention of the Codex. There’s nothing inherently superior about pages, they are only an artefact of the printing process: a printing press can only print so much at a time, and the reader finds it easier to hold a book with smaller pages. Similarly, books tend to present two pages of text at a time, left and right. Again this is simply an outcome of printing and binding a book made up of separate pages.

In the world of the web browser, there’s no simple way to mimic this page structure. A web “page” is a complete document, and the user can scroll the page up and down to view different parts. While it is possible to mimic the printed page in the browser, and even provide a two-page view, this cannot be done without sacrificing some other features. The text has to be split into separate pages, each of which must fit on the screen. The only way to ensure that they fit is to define precisely the size of the text, which then prevents the user from adjusting the font size if they wish.

People often tell me that they prefer turning pages to scrolling. But this is surely only a question of familiarity. Examined dispassionately, pages have a number of problems, not least of which is that they break the flow of the narrative or argument simply because we’ve reached the end of the page. How many times have I turned a page, only to turn back to remind myself of how the sentence began? And how many times, reading a novel, has my eye been tempted to stray across to the right hand page to see what’s coming, ruining the surprise the author had in store for me?

Overall, there’s no particular advantage of paging over scrolling, or vice versa. They’re just two solutions to the problem of presenting a work, each suited to a particular medium.

Pagination

Another usability issue, often mentioned by opponents of ebooks, is the lack of page numbering. As above, this is more about familiarity than utility, since page numbers are irrelevant when there are no pages.

But there is one use case where this complaint has some merit, and that is with referencing. In the print realm, it is common to quote some text and then give a citation which includes the page number of the book. Without the page number, it would be tedious to have to locate the passage if one had to read through an entire chapter to find it. Of course, with a web page, you can simply search for the text on the page and the browser will find it for you, faster than you can turn to the cited page number in a book. But still, people prefer what they already know.

It is possible to add reference points to a work, so that the reader can both cite and locate a particular passage. But doing so has two costs: first, it adds to the overall size of the file; second, it takes more time for the editor (me) to do. So in the case of this collection at least, you will generally not find page numbers or reference points, other than those provided by the table of contents.

Printing

A final usability question concerns printing. In spite of the extensive effort I have gone to in order to make the work readable on the screen, there will always be people who want to print it. Well, it’s not my task to tell other people what they can and cannot do, so I have gone to some lengths also to make sure (again using CSS) that a work will look good in print as well as on the screen. The result will not, of course, look quite as polished as a regular published edition. Printing from a browser lacks some of the refinements of print publishing software, such as hyphenation and management of widows and orphans. But the result will do for general purposes.

Editions

I made a conscious design decision that the books would be new editions, rather than trying to replicate existing print editions. There were two reasons for this:

  1. The plain text files I started with typically made no reference to which print edition was used. [Where they did, I have retained that information in the title verso and/or the metadata.]

  2. The task of exact replication of a print edition lies somewhere between hard and impossible, for any but the simplest works. It’s true that most novels have a simple structure involving not much more than division into chapters. But if there any embellishments, such as illustrations, ornaments, special typography, and footnotes and endnotes, then trying to replicate the look and feel of a print edition becomes an exercise in rapidly dimishing returns. It’s rarely worth the effort, and the result is rarely satisfactory.*

    * For an extreme example, see Tennyson’s Lady Clare, in which the text and images are each wrapped around the other in intimate embrace. It is possible with a great deal of fiddling with positions to achieve a result similar to the original — at least until the reader decides to change the font or font size, at which point it all falls apart.

    I have already discussed above the usability issues with pagination. Trying to replicate this in an ebook is actually counter-productive, because (as with paper) it breaks the flow of the text artificially. The printer has no choice but to do this, but on the web our page is of infinite size, so breaking up the text is unnecessary.

So you should consider* the books in this collection to be distinct, new editions, designed and crafted for the web, rather than being facsimiles of existing paper editions†. Hence the pains I have taken with the title page and verso.

* Most people seem to have trouble with this concept. I once had a discussion with the Open Library people about adding catalogue records for my books to their collection. Things were going well, and I sent them a sample of records, whereupon they asked, “and which edition is this, there’s no mention in the catalogue record?” I carefully explained my position that these were new editions, after which they went quiet and I haven’t heard from them since!

† There are a few works that attempt to reproduce some aspects of a print edition. See for example The Happy Prince, by Oscar Wilde, where the placement of the illustrations was of some importance, and therefore worth the extra effort.

Semantic Structure

The markup (HTML) applied to a book is important for managing the presentation, but it is also important in defining the semantic structure of the work. This means that the structural elements applied, rather than simply dividing components of the work, actually have a meaning that can be identified by some agent (a text analysis program for example). This requires not just identification of chapters and headings (the gross components) but also the fine details. So for example, if a piece of text is a telegram, it is identified as such. A researcher would love for this to be as detailed as possible, but in practice I have tended to limit myself to identifying structures for presentation purposes.

In some cases this semantic structure derives directly from the HTML tag used. E.g. in this code fragment:

<cite>Lewis Carroll</cite>

the cite tag indicates that the content of the tag is a citation, in this case the author Lewis Carroll.

Notice that this says nothing about the appearance of a citation (how it should be presented). It only says “this is a citation”. Any presentation we might want will be applied by our style sheet.

Commonly used semantic HTML tags are em and strong for emphasis, address, code, div and most commonly p.

The p tag is semantic because it says “this is a paragraph”, and div is semantic because it says “this is a distinct part of the text” (which may be a chapter, a letter or a quotation). On its own, a div tag is not terribly instructive, but by adding a class attribute* we can refine the meaning to make it as useful as required, for example:

<div class="telegram">

can be used to specify that the enclosed block represents a telegram.

So, the design rules I have adopted for structure are these:

  1. minimal subset of HTML, using only semantic tags (p, div, span, em.);

  2. use of class names to define structural elements;

* Note that there are no generally agreed class names for the different parts of a book, but I believe I have chosen names that make sense and are commonly used in the print world.

PART TWO

Technical

The Structure of a Book

The books are formatted using HTML and a style sheet. The HTML takes care of the structure — the division of a book into chapter, section, paragraph etc. — while the style sheet takes care of the presentation — how each structural component should look on your screen or printer.

Major Parts

Books typically consist of a number of parts, and these are pretty well described by the Chicago Manual of Style and similar documents, which we follow fairly closely. Essentially, a book is divided into three parts, called the front matter, body and back matter, and these in turn consist of zero or more parts, as follows2-1:

  • Front matter
    • title page
    • title verso (the back or reverse of the title page)
    • dedication
    • contents
    • foreword (an introduction to the work written by someone else)
    • preface (an introduction to the work written by the author)
    • introduction
    • prologue (an introduction to a tale, not written in the author’s voice)
    • acknowledgments
  • Body
    • book, volume, or part
      • chapter
        • section
      • essay
      • letter
      • act
        • scene
      • poem, canto
        • stanza
  • Back matter
    • conclusion
    • epilogue
    • afterword
    • appendix
    • notes
    • glossary
    • bibliography
    • index

In this collection, the above structure is coded in HTML using the DIV tag along with a “class” attribute named after the part which the text enclosed by the DIV represents. For example, a preface would be represented as:

<div class="preface">
     . . .     </div>

Note that a class attribute is intended (in HTML) to refer to a style defined in a style sheet. In our case, this is often not so, and there is in most cases no style associated with the above-mentioned class names. They are used more to describe structure, rather than presentation.

In many cases, there will be multiple occurrences of a division — for example, the great majority of novels contain multiple chapters — and these are distinguished with the “id” attribute.

<div id="chapter1" class="chapter">
     . . . </div>
<div id="chapter2" class="chapter">
     . . . </div>
     . . . 

1 This list should be regarded as “flexible” — the list is not definitive and additional classes might be added as deemed appropriate at the time.

Headings

Headings are encoded using either h3 or h4 tags. Major headings — the heading for one of the divisions defined above, such as the heading for a preface or a chapter title — are encoded using h3. Minor headings — sub-headings or section headings within a chapter — are encoded using h4. Rarely, I’ve also used h5 as a sub-sub-heading.

So, for example, the start of chapter 1 of Richard Burton’s Pilgrimmage to Al-Medinah and Meccah is encoded like this:

<div id="chapter1" class="chapter" title="CHAPTER I.">
<div class="header">
<h3>CHAPTER I.</h3>
<h4>TO ALEXANDRIA.</h4>
</div>
<p>A few words concerning      . . . 

Sometimes, I’ve used h2 for a “super-heading”, where a chapter is the first of a major section, usually labelled “Book” or “Volume” or “Part”. Such headings are omitted if they were merely an artefact of the printed work, e.g. where the work was printed as two volumes and the chapter numbering was continuous across the two volumes. If numbering of chapters began again from 1 with the second volume, then I’ve retained the “Volume” header rather than renumbering chapters, in order to avoid confusion with references.

Paragraphs

The most used component of any book is the paragraph, which is simply encoded with the p tag. Mostly the p tag is used without attributes, except for the first line of chapters, where I use the “dropcap” class.

Tables

Tables are quite straight-forward, but a bit of a pain to code, because every cell has to be formatted separately. In theory, this shouldn’t be necessary, because the colgroup and col tags should make it possible to style all cells in a column together, but unfortunately, the use of classes on these tags is not yet well supported. For example:

This is the left column. The text should be justified and aligned to the top, while the next column should be right justified and aligned to the bottom, and use 30% of the table width. 1,000
Second row. 2,000

So, I’ve defined a number of “atoms”, or simple classes defining single properties to use in styling table cells, which makes it a little less tedious:

.tc { text-align:center; }
.tr { text-align:right; }
.vat { vertical-align:top; }
.vab { vertical-align:bottom; }

So, for example, for a column of numbers, you might code each cell with <td class="tr vab">, which will align the text to the right and bottom of the cell.

table { margin:1em auto; }
th { font-weight:normal; } /* override broswer default */
.bt { border-top:1px solid gray!important; }
.br { border-right:1px solid gray!important; }
.bb { border-bottom:1px solid gray!important; }
.bl { border-left:1px solid gray!important; }

table.tb1 { border:1px solid gray; }
table.tb1 tr td { border:1px dotted gray; }
table.tb1 tr th { border:1px dotted gray; }
table.tb1 tr td { border:1px dotted gray; }

table.nb { border:none; }
table.nb tr th { border:none; }
table.nb tr td { border:none; }

Other structural components

Various other components are catered for using class attributes, which can be applied either to a single P tag, or to a DIV tag where the component has multiple paragraphs. The most common of these is the “quote”, which is not text in quotation marks, but text which is quoted within the text; this is essentially the same as the blockquote. Similar classes exist for verse, the parts of plays, footnotes etc.

Minimal HTML

As a deliberate policy, our web books use only a limited subset of the full range of HTML encoding. In particular, presentation tags such as B (bold), U (underline) and I (italic) are not used; tags relating to forms and frames are not used; other deprecated tags are not used. Also, although BLOCKQUOTE is used, current practice is to replace this with a <div class="quote"> block.

All of our web books should conform to strict HTML5

Presentation

The style sheet explained

The most important part of our web books — apart from the content itself — is the style sheet. The same style sheet is used by all our web books, although each will use only some of the classes provided.

This explanation assumes that you have at least a passing familiarity with Cascading Style Sheets (CSS) and the CSS2 specification.

Basic formatting

There are three things which contribute most to readability: some white-space in the form of margins to separate the content from the edge of the page; black text on a light background, to achieve good contrast; and the right width of lines, to reduce side-to-side eye movement. Look at just about any printed book, and these are what you see, for the simple reason that this is what works best. This basic formatting is achieved through the body tag.

First, we’ll style our content (body) appropriately: black text on (off‑)white background, readable font2, and a sensible width (measure), centered in the window:

body {
    background-color:#fcfff6;
    color:#000;
    font-family:Georgia, serif;
    margin:auto;
    max-width:33em;
    padding:3em;
}

And then we reduce the glare of the unused parts of the window by giving them a gray background — or a subtle gray image:

html {
    background:#fdfdfd url("widgets/endpaper.jpg") fixed;
}

This mimics the basic print “standard”, and works for all recent browsers, but we have a few more “tricks” which set our collection apart.

By default, paragraph text is justified3, meaning the horizontal spacing of text is altered to align the ends of lines to the right margin; the leading (spacing between lines) is increased; and the spacing between paragraphs is subtle but distinct4:

p {
    line-height:150%; /* leading */
    margin:0 auto .2em; /* para. spacing */
    text-align:justify;
    text-indent:0;
}

Finally, we distinguish the start of a new paragraph by indenting the first line of each:

p+p { text-indent:2em; }

The p+p of the second line ensures that only the second and subsequent lines of each section are indented. The first line is not indented.

2 I originally avoided specifying a specific font, reasoning that the user should be allowed to set their own preferred font through their browser — until I realised that many users had left their browser set on the default of Times New Roman!

3 Opinion varies on whether justification helps or hinders readability, but look at any print edition and that’s what you will see. Somehow, having a neat right margin to the text adds visual harmony to a work.

4 The usual browser default is to add a blank line (a 1em margin) between paragraphs, but this is too much, and creates a mental "break" in the flow.

“House” style

Some things which are really just “house” style — the editor’s choice. As with the paragraph element, headings, links and so on all have browser defaults, which may not be what we want. So it is essential to define our own styles for all elements at the outset, in order that we don't get any unfortunate surprises in formatting.

Headings are all centered, and sized appropriately; h1 is used for the book title, h2 for author (on the titlepage), h3 for chapter headings, h4 for section headings, h5 for sub-headings, and h6 for paragraph or minor heads.

h1,h2,h3,h4,h5,h6 {
        margin:1em auto;
        text-align:center;
}
h1,h2,h3,h4 {
        font-weight:bold;
}
h5, h6 {
        font-weight:normal;
}
h3,h4,h5 {
        font-variant:small-caps;
}
h6      { font-style:italic; }
h5 em   { font-variant:normal; }

h1 {    font-size:2em; }
h2 {    font-size:1.4em; }
h3 {    font-size:1.3em; }
h4 {    font-size:1.2em; }
h5, h6 {font-size:1em; }

Exceptions can be made to these defaults in specific circumstances, for example in the table of contents.

The standard style for links — blue, underlined — is intrusive and distracting. To make them less distracting, I change the colour to standard black, and replace the default underline with a subtle (gray) dotted line:

a, a:link, a:visited {
        color:#000;
        border-bottom:1px dotted gray;
        text-decoration:none;
}
a:active, a:hover {
        color:red;
}

The divisions of a book

In our style sheet, we define a number of classes to be used with the div element to delineate the different parts of the text, pretty much following the Chicago Manual of Style.

The major parts have extra space above to separate them from the previous part, and a dotted line at the end to separate them from the next. (This applies only to screen presentation; when printed, they are separated by a page break.)

@media screen {
    .titlepage {
        margin-top:1em;
        margin-bottom:1em;
    }
    .halftitle,
    .titleverso,
    .frontmatter,
        .dedication,
        .contents,
        .foreword,
        .preface,
        .prologue,
        .introduction,
        .acknowledgments,
        .frontispiece,
        .plate,
    .body,
        .volume,
        .book,
        .part,
        .chapter,
        .act,
        .essay,
        .story,
        .canto,
        .page,
    .backmatter,
        .afterword,
        .postscript,
        .epilogue,
        .appendix,
        .notes,
        .glossary,
        .bibliography,
        .index,
    .colophon {
            padding-bottom:1em;
            margin:1em auto;
    }
}

Minor sections are separated with extra spacing — equivalent to three blank lines.

.section { margin-bottom:3em; }

Obviously, not all of these classes are used in every text.

Note that divisions such as “book” and “volume” are artefacts of printing and therefore as a general rule have no relevance to web books. Words such as “book”, “volume” and “part” may however appear as part of structural headings, and therefore may be used where this has importance to the text.

Notice that one may be specific in assigning classes, or use the more generic “frontmatter” and “backmatter”. The result is the same. More descriptive classes would be valuable to anyone who wished to convert the text to some other format such as TEI for textual analysis.

Some of these divisions have additional attributes:

Title Page

The first page of the book, the Title Page is "special", and therefore has some special styling.

Everything on the titlepage is centered and bold.

.titlepage {
    text-align:center;
    font-weight:bold;
}
.titlepage p {
    text-align:center;
    line-height:140%;
}

The headings have precise uses: h1 = title; h2 = author; h3 = sub-title; h4 = other.

.titlepage h1 { margin-top:2em; margin-bottom:0em; }
.titlepage h2 { margin-top:2em; margin-bottom:0em; }
.titlepage h3 { margin-top:2em; margin-bottom:0em; }
.titlepage h4 { margin-top:2em; margin-bottom:0em; }
.titlepage p  { margin-top:2em; margin-bottom:0em; }
.titlepage hr { display:none; }

Our imprint is italic . . .

.titlepage p.imprint {
    text-align:center;
    font-style:italic;
}

Title Verso

The title-verso (back of the title page) section is where we put all the publication details. This is all gray and centered:

.titleverso {
    color:#666;
    font-family:Verdana, sans-serif;
    font-size:.8em;
    margin:auto;
    text-align:center!important;
    width:90%;
}
.titleverso p {
    margin-bottom:1em;
    text-align:center!important;
    text-indent:0;
}
.titleverso p a {
    color:#666; text-decoration:none;
}
.titleverso p a:visited {
    color:#666; text-decoration:none;
}
.titleverso p a:hover {
    color:#f00; text-decoration:underline;
}

Colophon

In printing, the colophon was a statement about the publication and printing of the work, placed at the very end of the book, or more recently on the title verso. Having found a number of works where the end of the book was actually missing, it seemed to me valuable to include a colophon at the end, rather than a simple The End, which seems appropriate only to fiction.

So I include a standard colophon as the last division in every book, to mark the end of the book. The colophon looks like this:

This web edition published by:

eBooks@Adelaide
The University of Adelaide Library
University of Adelaide
South Australia 5005

and is styled like this:

    .colophon p {
        color:#666;
        font-family:Verdana, sans-serif; font-size:.9em; 
        margin:1em auto;
        text-align:center!important; text-indent:0;
    }

That takes care of the basics. In the the following chapters, I’ll describe some of the special cases I’ve provided for.

Notes

Notes typically occur, in the print world, as either footnotes (at the bottom of the page) or endnotes (at the end of the chapter, or the end of the book).

In an ebook, footnotes can be problematic: where is the bottom of the page? Usually, this will be the same as the end of the chapter, so that footnotes become synonymous with endnotes. However, if the notes are brief, such as citations, it may make more sense to keep them close to their reference in the text. Long notes can be intrusive and interrupt the flow of the main content, so may be better placed as endnotes.

In my ebooks, the treatment of notes tends to vary depending on circumstances. Where there are few, brief notes, I generally keep these in the body, at the end of the paragraph in which they are referenced.

Wherever they are placed, we give notes a distinctive apprearance to make them easily distinguished from body text. They are in a different, and slightly smaller, font. When grouped as endnotes, I also separate them from the text with a horizontal line:

    .footnotes {
        border-top:1pt solid gray;
        font-family:Verdana, sans-serif;
        margin:1em; padding:1em 0;
    }
    .footnotes p { margin-bottom:1em; }
    .footnotes p,
    .footnotes li,
    .footnotes th,
    .footnotes td {
        font-family:Verdana, sans-serif;
        font-size:0.8em; text-indent:0;
    }

Ideally, notes should be encoded using the footnote class, but older works used the note class instead, with this CSS:

    .note {
        font-family:Verdana, sans-serif;
        font-size:.8em;
    }
    .note, .note p {
        margin:1em!important;
        text-indent:0;
    }

I’ve also defined an inline note, not much used but sometimes useful [for example when the note is just a citation], when an inline note can be less intrusive that linking out to a different place in the work.

    .inline-note {
        font-family:Verdana, sans-serif;
        font-size:.8em;
    }

Sidenote A common usage in very old texts was the marginal note, usually a sort of heading or precis of the content of the paragraph, either in the left or right margin,marginal note or indented into the body. I’ve defined two types of marginal notes, for the left and right margins, each indented into the body:

    .sn, .sidenote {
        clear:left; float:left; max-width:20%;
        margin:0.5em 1em 0 0;
    }
    .sn, .sn p, .sidenote, .sidenote p {
        font-size:.8em;
        font-style:italic;
        font-variant:normal;
        line-height:100%;
        text-align:left;
        text-indent:0;
    .mn, .marginal-note {
        clear:right; float:right;
        max-width:20%; margin:0.5em 0 0 1em;
    }
    .mn, .mn p, .marginal-note, .marginal-note p {
        font-size:.8em;
        font-style:italic;
        font-variant:normal;
        line-height:100%;
        text-align:right;
        text-indent:0;
    }

There’s also a popup note, which exploits the “tool-tips” display used by browsers. This does have the drawback of being available only when reading in the browser; it’s obviously not available if the user prints the work, and it doesn’t work in ereaders.

    .popup-note, abbr, acronym {
        border:1px dotted gray; cursor:help;
    }

And finally I define a special note for screen display only, used for instructions which only make sense in browser displays:

    @media screen {
        .screen-note { 
            font-size:.9em; text-indent:0;
            margin-left:1em; margin-right:1em;
            border:1pt solid gray; padding:2pt;
        }
    }
    @media print {
        .screen-note { display:none; }
    }

Quotations

Quotation, n: The act of repeating erroneously the words of another.

Ambrose Bierce, The Devil’s Dictionary

General quotation

By quotations, I mean not the use of quotation marks around speech, but blocks of quoted text within the body. The quotation may be small — perhaps even just a single line — or extensive, extending over a number of paragraphs.

You could just use the blockquote tag for these, but using <div class="quote"> instead gives us more control over presentation and semantic structure, because we can easily combine the quote class with other classes.

The CSS simply provides extra spacing around the quotation to distinguish it from the main body text:

    .quote, blockquote { margin:1em!important; }

Inscriptions

For inscriptions and epitaphs, we have:

    .inscription, .inscription p {
        font-variant:small-caps;
        margin:1em;
        text-align:center!important;
        text-indent:0;
        }

In Memory of Beza Wood
Departed this life
Nov. 2, 1837
Aged 45 yrs.

Here lies one Wood
Enclosed in wood
One Wood
Within another.
The outer wood
Is very good:
We cannot praise
The other.

Notices and Headlines

Use notice for advertisements, handbills, etc. It indents like quote, but puts a box drop shadow around bold text. Headline is used for news headlines. It’s the same as notice, but without the box.

.notice {
    border:1px solid;
    margin:1em auto;
    padding:1em;
    -moz-box-shadow: 5px 5px 20px 10px #aaa;
    -webkit-box-shadow: 5px 5px 20px 10px #aaa;
    box-shadow: 5px 5px 20px 10px #aaa;
    }
.notice p, .headline, .headline p {
    font-weight:bold;
    text-align:center!important;
    text-indent:0;
    }

For example (from The Poison Belt:

WARNING.

Visitors, Pressmen, and Mendicants
are not encouraged.

G. E. CHALLENGER.

Epigraphs

A special form of quotation is the epigraph, defined in the OED as a “short quotation . . . placed at the commencement of a work, a chapter, etc. to indicate the leading idea or sentiment”.

.epigraph {
    font-size:.9em;
    font-style:italic;
    margin:1em auto;
    text-align:left;
    text-indent:0;
    width:65%;
    }
.epigraph p {
    margin:0 0 0 2em!important;
    text-align:left;
    text-indent:-2em!important;
    }
.epigraph p em {
    font-style:normal;
    font-variant:small-caps;
    }

The epigraph would normally occur in the header of a section, or on the title page. An example can be seen at the head of this chapter.

Letters

Probably the most common use of quotation, after verse, is for letters. The class letter may be combined with quote for semantic purposes, but has no specific style associated with it. But there are styles defined for letters that are written or typed. Typed letters are monospaced (just like a typewriter!) while written letters use italics to indicate handwriting:

.typed { font-family:monospace; }
.written {
    font-family:cursive;
    font-style:italic;
    }

And just because we can, we define a style specifically for telegrams:

.telegram {
    font-family:Courier, monospace;
    font-variant:small-caps;
}

Headers

Other standard parts of a work (as defined by the Chicago Manual of Style, for example) are also given classes. As with chapters etc, this is partly done so as to identify the different parts in the file, but also permits application of stylistic differences. The parts currently defined include: dedication, abstract, precis, rubric and colophon.

div.dedication p,
p.dedication { text-align:center; }

Next we have the precis, defined as “a concise or abridged statement; an abstract, a summary”, typically found at the start of a chapter beneath the chapter heading, and summarising the content of the chapter.

div.precis p,
p.precis {
    font-variant:small-caps; font-size:.9em;
    margin-left:1em; margin-right:1em;
}

A variation of this seen in very old books is the rubric, defined as a “heading of a chapter or other section in a book or manuscript, written or printed in red, or otherwise distinguished in lettering; a particular passage or sentence marked in this way” OED.

div.rubric p,
p.rubric { text-align:center; font-style:italic; }

Another variation is the more straight-forward abstract:

div.abstract p,
p.abstract {
    font-style:italic; font-size:.9em;
    margin-left:1em; margin-right:1em;
}

Plays

Hamlet To be, or not to be: that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune, . . .

Special classes are used for Plays, to delineate the name of the character, her speech, and stage directions:

.act p { margin-left:1em; }

.speaker {
    font-variant:small-caps;
    margin-left:-1em;
}

.stage { font-style:italic; }

We also use div.play and div.act but these have no styles applied and are therefore not mentioned in the stylesheet.

Poetry

Poetry presents some unique problems, with short, sometimes indented lines, with the whole centered on a page without the actual lines being centered. Somewhat like this:

I must go down to the sea again,

To the lonely sea and sky

I left my shirt and socks there;

I wonder if they’re dry?

Spike Milligan

An early approach to verse, and one I still see occasionally today, was to use the PRE tag. The results are quite ordinary:

        I must go down to the sea again,
          To the lonely sea and sky
        I left my shirt and socks there;
          I wonder if they’re dry?

Better is to define a class for verse, which gives us greater control over formatting: basically verse needs to be left justified and indented:

div.stanza {
    margin:1em auto;
    width:70%;
    }
div.stanza p {
    margin-left:2em;
    text-align:left;
    text-indent:-2em;
    }

This wraps each verse (or stanza) in a div, defining a width of 70% of the body, while the margin auto declares equal margins left and right. The second line causes any line which is too long to fit to wrap to the next line and indent. The 70% width is “about right” for many poems, but if the lines are short we can adjust that to compensate so that the verse is more centered on the page.

I must go down to the sea again,

To the lonely sea and sky

I left my shirt and socks there;

I wonder if they’re dry?

 

For longer poems, it is sometimes useful to include line numbers (usually only every fifth or tenth line). This will push them unobtrusively to the right margin:

.ln {
    color:gray;
    float:right;
    font-style:italic;
    font-size:.8em;
    margin:0 -2em 0 1em;
    text-align:right;
    text-indent:0;
}

She left the web, she left the loom,

She made three paces thro’ the room,

She saw the water-lily bloom,

She saw the helmet and the plume,40

She look’d down to Camelot.

Out flew the web and floated wide;

The mirror crack’d from side to side;

“The curse is come upon me,” cried

The Lady of Shalott.

Images

Books can feature a variety of types of images, and I have developed a number of classes to cater for them.

Illustrations

The simplest case is an illustration inserted between paragraphs of text, intended to illustrate some aspect of the text. This can be coded using the “illustration” class, and is styled as follows:

img { border:none; max-width:100%; }

.illustration {
    margin:1em auto;
    max-width:100%;
    text-align:center!important;
}
.illustration p {
    font-size:.9em;
    font-variant:small-caps;
    text-align:center!important;
    text-indent:0!important;
}

Notice that the maximum width is 100%, so that an image doesn’t unintentionally overflow the page, and that the image is centered on the page. We also allow for a text caption, styled to be differentiated from the body text.

Horus, Son of Isis, introducing the Scribe Ani to Osiris.

Horus, Son of Isis, introducing the Scribe Ani to Osiris.

[From the Papyrus of Ani. Brit. Mus., Pap., No. 10470.]

For semantic purposes, there are also classes named map, figure, frontispiece, and plate. These are styled as for illustration, with the exception that frontispiece and plate will force a page break if printed.

Ornaments

A second class of image is the ornament. Unlike the illustration, which illustrates some point in the text, ornaments are purely decorative. They’re unimportant, except in so far as they add to the aesthetic quality of a work, and generally appear at the start and end of a chapter, or as section dividers. For example, using the “ornament” class:

ornament

For essentially semantic purposes, I also use the more specific class names headpiece and tailpiece for those images specifically used at the start and end of a chapter.

Here’s a sample headpiece:

ornament

. . . and a sample tailpiece.

ornament

Embedded figures

There are also two classes for figures that are embedded within the text, to the left or right, with the text wrapped around them, as shown below.

.figleft, .figright {
    font-size:.9em;
    font-variant:small-caps;
    max-width:50%;
    text-align:center!important;
    text-indent:0!important;
}
.figleft {
    float:left;
    margin:0;
    padding:.5em .5em .5em 0;
}
.figright {
    float:right;
    margin:0;
    padding:.5em 0 .5em .5em;
}

Sed ut perspiciatis, unde omnis iste natus error sit voluptatem
Fig. 6.
accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione Bernhardt as Salome voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?

Maths

There is a standard markup language for the presentation of Mathematical equations, MathML, but currently it seems to be well supported only by Firefox. It’s also rather unwieldy. For example, here’s the simple quadratic equation ax² + bx + c expressed in MathML:

<math xmlns="http://www.w3.org/1998/Math/MathML">
    <mrow>
      <mi>a</mi>
      <mo>&InvisibleTimes;</mo>
      <msup>
        <mi>x</mi>
        <mn>2</mn>
      </msup>
      <mo>+</mo>
      <mi>b</mi>
      <mo>&InvisibleTimes; </mo>
      <mi>x</mi>
      <mo>+</mo>
      <mi>c</mi>
    </mrow>
  </math>

Now, I could learn to live with that, but since it’s not yet widely supported, I decided to make my own CSS to present Math equations:

.math, {
    display:inline-block;
    text-align:center;
    text-indent:0;
    vertical-align:middle;
    }

and the HTML is simply:

<span class="math">ax<sup>2</sup> + bx + c = 0</span>

which produces: ax2 + bx + c = 0

Now, obviously, that’s a rather trivial example, so let’s look at the well-known solution to the quadratic:

<span class="math">x =
    <span class="math">
        <span class="math bb">−b ± √<
            span class="math bt">b<sup>2</sup> − 4ac</span>
        </span><br />
        <span class="math">2a</span>
    </span>
</span>

which produces this very attractive result: x = −b ± √b2 − 4ac
2a

The salient points to note are:

For a real example of the use of this style, see The System of the World, by Isaac Newton.

Multi-file styles

Where books are split into multiple files (e.g. by chapter), each file ends with navigation and document info. sections. The effect of these can be seen in any ebook and need not be explained here.

The Document header section. This should be used at the head of individual parts when a book is divided into multiple files, but NOT on the title page. Normally, only h2 is used here.

div.dochead { text-align:center; }
div.dochead h2 {
    font-family:sans-serif;
    font-weight:normal;
    font-size:1em;
    font-style:normal;
    color:gray;
    }
div.dochead hr { display:none; } /* fix for older pages */

Navigation is used at the end of each part:

div.navigation {
    font-family:sans-serif;
    font-size:.9em;
    text-align:center;
    text-indent:0;
    margin-top:2em;
    border-top:1px dotted gray;
    padding-top:2em;
    }
div.navigation p {
    font-size:.9em;
    text-align:center;
    text-indent:0;
    }
div.navigation a,
div.navigation a:visited {
    border-right:2px solid gray;
    border-bottom:2px solid gray;
    background-color:#ddd;
    color:#333;
    text-decoration:none;
    padding:3px;
    font-size:.9em;
    font-family:sans-serif;
    }

Docinfo is added to the end of each part:

div.docinfo {
    font-family:sans-serif;
    font-size:.9em;
    color:#666;
    background-color:#fff;
    text-align:center;
    }
div.docinfo p { text-align:center; }

div.docinfo p a {
    font-family:sans-serif;
    color:#666; background-color:#fff;
    text-decoration:underline;
    }
div.docinfo p a:visited {
    font-family:sans-serif;
    color:#666; background-color:#fff;
    text-decoration:underline;
    }
div.docinfo p a:hover {
    font-family:sans-serif;
    color:#f00; background-color:#fff;
    text-decoration:underline;
    }

Miscellaneous styles

The style sheet also contains styles for a number of standard HTML tags. These should be self-explanatory.

Rules were used to divide major divisions, but are now deprecated: they are greyed to make them less dominant. As a general principle, anything not part of the actual book text is greyed out.

hr { color:#ddd; background-color:#fff; }

A “bookmark” class is used for links provided for bookmarking purposes, making them “invisible” except when the user hovers the mouse over their text:

a.bm, a.bm:link, a.bm:visited {
    color:black;
    text-decoration:none;
}
a.bm:active, a.bm:hover {
    color:red;
}

Some “position” classes, e.g. for IMG placement:

.left { float:left; }
.right { float:right; }
.center { text-align:center; }
.clear { clear:both; }

Something to put a box around anything:

.border { border:1px solid; padding:1em; }

Citations are underlined, not italic:

cite {
    text-decoration:underline;
    font-style:normal;
}

Lists deserve a little more space between items:

li { margin-top:.5em; }

superscript is used for note references/numbering. Size ensures that the numbers don’t intrude by increasing line spacing.

sup { font-size:.7em }

A transition class is defined for separations between paragraphs, where a break in the flow is intended, without a chapter or subhead.

.transition { margin:2em 0; text-align:center; }

Transition is now deprecated, as you can get the same effect using div.section without a heading.

Pre-format class — same as pre, but needs br to break lines:

div.pre { font-family:monospace; text-align:left }

Fix for IE5.5-Mac:

pre { font-family:monospace; text-align:left }

When we specify language explicitly, use a bigger font — it looks better:

span[lang] { font-size:larger; }

Devices

The first ebook was created, according to some, in 1971:

On July 4 1971, after being inspired by a free printed copy of the U.S. Declaration of Independence, he decided to type the text into a computer, and to transmit it to other users on the computer network.

“Obituary for Michael Stern Hart”, Project Gutenberg. Viewed 25-05-2015.

However that may be, nothing much happened until the invention of the computer screen and the World Wide Web 1994, at which time all kinds of "online book" projects appeared, seeded by work previously done by countless individuals sharing texts through bulletin boards [BBS].

Still, being tethered to a computer to read books was unsatisfying to most people, so when the early PDA devices, such as the PalmPilot, appeared on the scene 1996, people began to dream of a device for reading books that you could carry with you anywhere, and in 1998 the first of many such devices, the Rocket eBook, was launched, to great enthusiasm among a small core of people who had been wishing for such a thing for some time. Later, of course, came the Kindle, which more or less swept all before it: “Amazon released . . . the first generation Kindle device on November 19, 2007, for US$399. It sold out in five and a half hours.”* The future seemed to be Kindle, until . . .

. . . Apple released the iPhone 3G [July 11, 2008], and later the iPad 2010. Finally, a device which was great for books, but also useful for many other things. [Although there remain many Kindle fans, who don't want to do other things, and prefer the e-ink display of the Kindle to the glare of the iPad.]

So, we now inhabit a world with many options for ebooks, from the screen of a laptop or desktop computer to a smart phone, and it is therefore desirable, if not mandatory, to make any ebook available on all devices. Which causes a number of difficulties, in particular the need to accomodate different sizes of screen.

* http://en.wikipedia.org/wiki/Amazon_Kindle#cite_note-Patel-13

Printing

Finally, we specify a few print controls, for anyone who wants a printed copy. This is most useful for printing from the complete (composite) file, and should produce something that looks “just like a real book”!

First add some (arbitrary) sensible constraints:

@media print {
    body {font-size:11pt; font-family:Georgia, serif; }

. . . provide a page break after each major part:

    div.halftitle,
    div.contents {
        /* page-break-before:right; */
        page-break-before:always;
    }
    div.titleverso,
    div.frontmatter,
        div.docinfo,
        div.dedication,
        div.preface,
        div.foreword,
        div.introduction,
        div.acknowledgments,
    div.chapter,
    div.act,
    div.essay,
    div.canto,
    div.backmatter,
        div.afterword,
        div.appendix,
        div.notes,
        div.glossary,
        div.bibliography,
        div.index,
        div.colophon {
            page-break-before:always;
    }
    h3,h4 { page-break-after:avoid; }

. . . force the imprint to the bottom of the title page (this doesn’t always work!):

    div.titlepage p.imprint { margin-top:60%; }

. . . and finally, fix a few “legacy” problems . . .

    div.navigation { display:none; }
    div.titlepage hr { display:none; }
    div.contents hr { display:none; }
    div.docinfo hr { display:none; }
}

We’ll also define a required page size (noting that this is not yet implemented by major browsers:

@page {
    size:6in 9in;
    margin:25mm 20mm 20mm 20mm;
}

PART THREE

Production

Selection Criteria

The original “project brief” was to demonstrate that books could be formatted in HTML with acceptable results. Having proved that, the next goal was to make accessible “The Great Books”. This was a good goal in the early days of the project, notwithstanding the controversies over what constitutes a great book, because others had already done the work of digitising most of the texts, making them readily available for adding to our collection.

Since those early days, selection has been broadened to encompass several literature genres (gothic, sci-fi, etc.), travel, science; and also books that others have asked for (where possible).

We originally conceived of a great books program, similar to those running elsewhere. This is as good a starting point as any for selecting texts. But the main driver turned out to be the availability of texts. Although we initially expected to be scanning works ourselves, we don’t (yet) have the resources. And there’s no point scanning works when plain text copies are freely available from elsewhere. The intention is to eventually digitise selected works from our own print collection.

Otherwise, the main selection policy is “what interests me” at any particular moment.

Sources

Where do the books come from? If I’d scanned and OCR’d every book myself, there would be perhaps only a hundred or so, rather than the 3,000+ available today. Scanning is a lot of work. Also, you need a copy of a printed edition in order to scan, and despite working in a University library with two million volumes, I don’t have ready access to everything I might want. Even more than scanning, proof-reading and correction after OCR is even more time consuming. And I’m only one guy, so . . . in most cases, I’ve taken the source text from some other site providing free public domain texts.

 

Chief of these is the venerable Project Gutenberg, which I’ve been following since they had only a couple of thousand books. They now have over 40,000, thanks largely to the efforts of the Distributed Proofreaders project, which very cleverly taps into the crowd-sourcing power of the internet to do the OCR proof-reading and correction. Neither is perfect: Project Gutenberg has a lot of duplication; sometimes they have incomplete multi-volume works — volume 2 without volume 1, for example; some complex works are a mess that requires considerable effort from me to untangle and produce a well-structured book. Distributed Proofreaders also has some eccentricities that can cause me difficulties, particularly with notes, quotations and verse. But by and large, together they have produced a marvellous resource for constructing ebooks.

I also make good use of the Australian and Canadian versions of Project Gutenberg, for material that’s copyright in the USA and therefore unavailable on the US site.

“Raiders of the Lost Books”

In the early days, I made use of some other early etext collections: Wiretap (still going); MIT’s Classics Archive (now being resurrected from the dead); the ERIS Project (defunct) from Virginia Tech. Many of these works I found subsequently had been added to Project Gutenberg.

 

When these sources have failed me, I’ve often been able to locate an ebook version of a title on some other site. I have taken these and reformatted with a clear conscience, since the original text is in the public domain, while being (privately) grateful to the site owner for making the work available. Usually, for the reasons stated in the introduction, the formatting on the source site is horrible, so I’m liberating the book from unreadable formats to make it more accessible. Also, many of these sites are “fragile”, in the sense of existing at the whim of the site owner, and may at any time disappear as a result of financial pressures or lack of energy. I have seen a number of sites disappear in this way. (There were some useful sources on the old Geocities site, which sadly vanished when Yahoo decided to shut it down.) So my work is also rescuing ebooks from potential oblivion.

The Internet Archive

When all else fails, I turn to the Internet Archive, which has a huge collection of works, but scanned images only, derived from the original Google Books and Microsoft Books projects in concert with a number of University and large public library collections. This is an amazing resource, with all manner of obscure and almost unobtainable texts. On the downside, it is also incredibly badly organised, and it sometimes requires considerable effort to sort out one volume or edition from another: basically, you have to examine each work to figure it out. Also, some of the scans are better than others (and some are so bad as to be worthless): some of the earliest Google Books project scans are very rough, intended only as material for OCR, whereas later scanning efforts seem to be much better quality. Those from Cornell University are usually of the best quality. But, if you can find a good scan, you’ll probably find that the text version (the raw OCR output) also available from the Internet Archive will be worth using, and I’ve made a few good ebooks from these (after a few hours of correction).

Producing the books

Production can be broken down into a number of steps:

  1. Scanning a print book. This is tedious work, the most demanding part being making sure you don't miss a page. Keeping the pages aligned is also important. In theory, you can scan around 10 pages per minute with a decent scanner and a good pair of hands. So a 300 page novel should take around half an hour. In practice, you'll probably manage that book in a couple of hours, spread over a day. Fortunately, someone else has probably already done the scanning, so you mostly won't need this step.

  2. Conversion to text (OCR). Once the book has been scanned, you need to convert the page images into text. There are plenty of tools to do this, sometimes bundled with scanning software. Personally, I use a Linux application called tesseract which generally does an outstanding job. But whatever software you use, you're going to get errors. OCR is notorious for mixing up "be" with "he", "c" and "o", "th" with "tli", etc. The clarity of the original scan, and the typeface used in the book, are crucial. If the letters are too close together, too small, italicised, you will get more errors. If the page was dirty, or scribbled on or underlined by a reader, then you will get more errors. Unless you can find a cleaner text to scan, you're stuck with fixing these errors. Which brings us to step 3:

  3. Proof-reading/correction. Now that you have the raw text of your book, you need to correct all those OCR errors. This is as simple as it sounds: open the text in your editor of choice, preferably with spell-check highlighting turned on. Then you work through, page by page, fixing all the errors, probably with reference to the original page scans where the correct text is not obvious. This can take ages, so I find it's a good activity for an evening in front of the TV. Probably with a glass of wine.

  4. Adding structure. Once corrected, you can proceed to the next stage, which is to apply some structure to the book. Remember, at this stage you just have a plain-text file, so you will need to go through and identify chapter headings, quotations, verse, and all the other things mentioned in previous sections of this work. And it's at this point that you need to decide whether to retain the original page numbers (useful if your book has an index) or discard them. I usually discard them for straight-forward fiction, keep them for complex non-fiction works. If I'm keeping them, I like to wrap the page number in braces, as {pg32}, which makes them easy to identify later.

  5. Building the ebook. With a clean, well-structured plain text file, we can convert the text to HTML. I use a locally-written Perl script to do this, and then more scripts to split the HTML version into multiple parts, usually by chapter. Finally, a “front section” with table of contents is created linking them all together.

APPENDIX

The complete style sheet

.imprint {
background:
url("")
center no-repeat;
opacity:0.9;
position:absolute;bottom:0;left:0;
margin:0;
width:100%;height:120px;
}
/*
Style Sheet for eBooks@Adelaide web books
Author: Steve Thomas, stephen.thomas@adelaide.edu.au
Version: 2014.10.26
Rights: Public Domain
*/
body	{
	background-color:#fcfff6;
	color:#000;
	font-family:Georgia, serif;
	margin:auto;
	max-width:33em;
}
p	{
	line-height:150%;
	margin:0 auto .2em;
	text-align:justify;
	text-indent:0;
}
p+p	{
	text-indent:2em;
}
span.first {
	text-indent:0;
	text-transform:uppercase;
}
div	{ margin-bottom:1em; position:relative; }
h1,h2,h3,h4,h5,h6 {
	margin:1em auto;
	text-align:center;
}
h1,h2,h3,h4 {
	font-weight:bold;
}
h3,h4,h5 {
	font-variant:small-caps;
}
h5 em	{ font-variant:normal; }
h1	{ font-size:2em; }
h2	{ font-size:1.4em; }
h3	{ font-size:1.3em; }
h4	{ font-size:1.2em; }
h5, h6	{ font-size:1em; font-weight:normal; }
h6	{ font-style:italic; }
a, a:link, a:visited {
	color:#000;
	border-bottom:1px dotted gray;
	text-decoration:none;
}
a:active, a:hover {
	color:red;
}
/* style atoms : classes defining a single style feature */
.fs80	{ font-size:80%!important; }
.fs90	{ font-size:90%!important; }
.fs130	{ font-size:130%!important; }
.fs150	{ font-size:150%!important; }
.fs200	{ font-size:200%!important; }
.lh100	{ line-height:100%!important; }
.lh120	{ line-height:120%!important; }
.lh130	{ line-height:130%!important; }
.lh150	{ line-height:150%!important; }
.lh180	{ line-height:180%!important; }
.lh200	{ line-height:200%!important; }
.fwn	{ font-weight:normal!important; }
.i1	{ padding-left:1em; }
.i2	{ padding-left:2em; }
.i3	{ padding-left:3em; }
.i4	{ padding-left:4em; }
.i5	{ padding-left:5em; }
.i6	{ padding-left:6em; }
.i7	{ padding-left:7em; }
.i8	{ padding-left:8em; }
.i9	{ padding-left:9em; }
.i10	{ padding-left:10em; }
.i11	{ padding-left:11em; }
.i12	{ padding-left:12em; }
.w10	{ width:10%!important; }
.w20	{ width:20%!important; }
.w25	{ width:25%!important; }
.w30	{ width:30%!important; }
.w33	{ width:33%!important; }
.w40	{ width:40%!important; }
.w50	{ width:50%!important; }
.w60	{ width:60%!important; }
.w67	{ width:67%!important; }
.w70	{ width:70%!important; }
.w75	{ width:75%!important; }
.w80	{ width:80%!important; }
.w90	{ width:90%!important; }
.w100	{ width:100%!important; }
.ni	{ text-indent:0!important; }
.in	{ text-indent:2em!important; }
.left	{ float:left; padding-right:1em; }
.right	{ float:right; padding-left:1em; }
.center, .center p
	{ text-align:center!important; }
.clear	{ clear:both; }
.border	{ border:1px solid; padding:1em; }
.dropshadow {
	-moz-box-shadow:5px 5px 25px 10px #888;
	-webkit-box-shadow:5px 5px 25px 10px #888;
	box-shadow:5px 5px 25px 10px #888;
	}
.underlined { text-decoration:underline; }
.hi	{ font-style:italic; }
.hi em	{ font-style:normal; }
.it	{ font-style:italic; } /* deprecated */
.sc	{ font-variant:small-caps; }
.uc	{ text-transform:uppercase; }
del, .del {text-decoration:line-through}
.tl	{ text-align:left!important; }
.tc	{ text-align:center!important; }
.tr	{ text-align:right!important; }
.tj	{ text-align:justify!important; }
.tw	{ letter-spacing:.5em; } /* "text wide" */
.ls	{ letter-spacing:1em; }
sup, sub{
	font-size:.7em;
}
sup	{
	line-height:80%;
}
hr	{
	color:#ddd;
	margin:2em auto;
	width:33%;
}
code	{
	font-family:monospace;
	font-size:110%;
}
pre, .pre {
	font-family:monospace;
	font-size:110%;
	text-align:left;
	text-indent:0;
	white-space:pre-wrap;
}

/* Title page */
#titlepage {
	border:2px solid #00609c;
	height:47em;
	position:relative;
}
#titlepage h1,
#titlepage h2,
#titlepage h3,
#titlepage h4,
#titlepage h5,
#titlepage h6,
#titlepage p {
	font-weight:normal;
}
#titlepage h1,
#titlepage h2 {
	line-height:180%;
}
.titlepage {
	font-weight:bold;
	padding:1em;
	text-align:center!important;
}
.titlepage h1,
.titlepage h2,
.titlepage h3,
.titlepage h4,
.titlepage p {
	margin-top:0;
	margin-bottom:1em;
}
.titlepage h3,
.titlepage h4 {
	font-variant:normal;
}
.titlepage p {
	text-align:center;
	text-indent:0;
}
.titleverso {
	color:#666;
	font-family:Verdana, sans-serif;
	font-size:.8em;
	margin:auto;
	padding-top:3em;
	text-align:center!important;
	width:90%;
}
.titleverso p {
	margin-bottom:1em;
	text-align:center!important;
	text-indent:0;
}
.titleverso p a {
	color:#666;
	text-decoration:none;
}
.titleverso p a:visited {
	color:#666;
	text-decoration:none;
}
.titleverso p a:hover {
	color:#f00;
	text-decoration:underline;
}
.halftitle {
	/* height:47em; */
	padding:10em 0;
}
.halftitle h1,
.halftitle h2,
.halftitle h3,
.halftitle h4,
.halftitle h5,
.halftitle h6 {
	font-size:2em;
	font-weight:bold;
	line-height:2em;
	text-align:center;
}
/* Table of Contents */
.contents h4,
.contents h5,
.contents h6 {
	font-variant:normal;
	font-weight:normal;
	text-align:left!important;
}
.contents h5,
.contents h6 {
	margin-left:1em;
}
.contents p {
	font-size:.9em;
	margin-left:1em;
	margin-bottom:1em;
	text-indent:0!important;
}
#contents p {
	text-indent:0!important;
}
/* Chapter, etc., header */
.header {
	margin-top:0;
	margin-bottom:3em;
}
.section .header { margin:0; }
.header h1,
.header h2,
.header h3,
.header h4,
.header h5 {
	font-variant:small-caps;
	font-weight:bold;
	margin-top:0;
	margin-bottom:1em;
	text-align:center;
}
.header p {
	margin-left:2em;
	text-align:justify;
	text-indent:-2em;
}
.header.modern h2,
.modern h3,
.modern h4,
.modern h5,
.modern h6 {
	font-weight:normal;
	text-align:left;
}
.header.modern h2 { font-size:200%; }
.modern h3 { font-size:180%; }
.modern h4 { font-size:150%; }
.modern h5 { font-size:120%; }
.modern h6 { font-size:110%; }


/* ... and components which should sit within the header */

.rubric, .rubric p {
	font-size:1.1em;
	font-style:italic;
	margin:1em 2em;
	text-align:center;
	text-indent:0;
}
.abstract, .abstract p {
	font-size:.9em;
	font-style:italic;
	margin:1em 2em;
	text-indent:0;
}
.precis h3 {
	font-size:180%;
	font-weight:normal;
	text-align:left;
}
.precis h4 {
	font-size:160%;
	font-weight:normal;
	text-align:left;
}
p.precis, .precis p {
	font-size:130%;
	/* font-variant:small-caps; */
	margin:1em 0 1em 2em;
	text-align:justify;
	text-indent:-2em;
}
.chapter .precis { font-family:Verdana, sans-serif;font-size:100%; }
.section .precis { font-size:130%; }
.epigraph {
	font-size:.9em;
	font-style:italic;
	margin:1em auto;
	text-align:left;
	text-indent:0;
	width:65%;
}
.epigraph p {
	margin:0 0 0 2em!important;
	text-align:left;
	text-indent:-2em!important;
}
.epigraph p em {
	font-style:normal;
	font-variant:small-caps;
}
/* -- */
.runh { font-variant:small-caps; }
.section { clear:both;margin-bottom:3em; }
.bibliography p, .glossary p, .index p {
	font-family:Verdana, sans-serif;
	font-size:0.9em;
	margin-bottom:.5em;
	padding-left:2em;
	text-indent:-2em;
}
.colophon p {
	color:#666;
	font-family:Verdana, sans-serif;
	font-size:.9em;
	text-align:center!important;
	text-indent:0!important;
}
/* NOTES */
.notes, .footnotes,
.note, .footnote,
.inline-note {
	font-family:Verdana, sans-serif;
}
.notes {
	font-size:0.8em;
}
.notes p {
	margin-bottom:1em;
	text-indent:0;
}
.footnotes {
	border-top:1pt solid gray;
	margin:1em;
	padding:1em 0;
}
.footnotes p {
	margin-bottom:1em;
}
.footnotes p,
.footnotes li,
.footnotes h5,
.footnotes th,
.footnotes td {
	font-size:0.8em;
	text-indent:0;
}
.note, .footnote, .inline-note {
	font-size:.8em;
}
.footnote, .note p, p.note {
	margin:1em!important;
	text-indent:0;
}
.sn	{
	clear:left;
	float:left;
	font-size:.7em;
	font-style:italic;
	line-height:110%;
	margin:.5em .5em 0 -1em;
	max-width:7em;
	text-align:left;
	text-indent:0;
}
.mn	{
	clear:right;
	float:right;
	font-size:.7em;
	font-style:italic;
	line-height:110%;
	margin:.5em -1em 0 .5em;
	max-width:7em;
	text-align:right;
	text-indent:0;
}
.popup-note, abbr, acronym {
	border:1px dotted gray;
	cursor:help;
}
.screen-note {
	border:1pt solid gray;
	font-size:.9em;
	margin-left:1em;
	margin-right:1em;
	padding:2pt;
	text-indent:0;
}
/* QUOTES */
blockquote,
.quote { font-size:90%;margin:1em auto;width:90%; }
.letter { margin:1em!important; }
.inscription, .epitaph {
	font-variant:small-caps;
	margin:1em;
	text-align:center!important;
}
.inscription p, .epitaph p {
	margin:0;
	text-align:center!important;
	text-indent:0;
}
.notice {
	margin:1em auto;
	padding:1em;
	-moz-box-shadow:5px 5px 20px 10px #aaa;
	-webkit-box-shadow:5px 5px 20px 10px #aaa;
	box-shadow:5px 5px 20px 10px #aaa;
}
.notice p,
.headline,
.headline p {
	font-weight:bold;
	text-align:center!important;
	text-indent:0;
}
.dedication, .dedication p {
	text-align:center!important;
	text-indent:0;
}
cite, .cite {
	font-variant:small-caps;
	font-style:normal;
}
.quote p.cite,
.stanza p.cite,
.epigraph p.cite,
.epigraph cite {
	text-align:right;
}
.epigraph p.cite:before {
	content:"–";
}
.typed {
	font-family:monospace;
}
.written p {
	font-style:italic;
}
.written p em {
	font-style:normal;
}
.telegram {
	background-color:#F5F5BD;
	border:1px solid gray;
	font-family:Courier, monospace;
	margin:1em auto;
	padding:1em;
	text-align:left;
	text-transform:uppercase;
	width:80%;
}
.telegram p {
	text-align:left;
}
.address, .address p {
	margin:1em 0 1em 2em;
	text-align:left;
	text-indent:-2em;
}
.signed, .signed p,
.dateline, .dateline p {
	text-transform:capitalize;
	text-align:right;
	font-style:italic;
}
.salut	{
	font-variant:small-caps;
	text-indent:0;
}
/* PLAYS */
.act p {
	text-indent:0;
	margin:0 0 .5em 1em;
	margin-top:0;
	margin-left:1em;
	margin-bottom:.5em;
}
.scene { margin-bottom:3em; }
.speaker {
	float:left;
	font-variant:small-caps;
	margin-left:-1em;
	margin-right:1em;
}
/*
span.speaker { margin-left:-1em; }
li span.speaker,
div.act p.stage span.speaker,
p span.stage span.speaker {
	margin-left:0;
}
*/
.speech { margin-left:1em; } /* DEPRECATE */
.stage,
.stage p {
	font-style:italic;
	text-indent:0;
}
li span.speaker,
span.name,
.stage span.speaker,
.stage em {
	font-style:normal;
	font-variant:small-caps;
}
/* POETRY */
div.song {
	font-style:normal;
	}
div.song p {
	line-height:150%;
	margin-bottom:1em;
	text-align:left;
	text-indent:0;
	}
.stanza {
	margin:1em auto;
	width:70%;
	position:relative;
	}
.stanza p,
.hang /* hanging indent */
	{
	text-align:left;
	text-indent:-2em!important;
	margin:0 0 0 2em!important;
}
.stanza p.dropcap:first-letter {
	float:none;
	margin-left:0;
	text-align:left;
	text-indent:0;
}
.stanza .speaker {float:none;}
.couplet,
.verse {
	margin:1em auto;
	max-width:80%;
	text-align:left;
	text-indent:0;
}
.couplet p,
.verse p {
	margin:0;
	text-align:left;
	text-indent:0;
}
p.verse,
p.stanza {
	margin:1em auto;
	max-width:80%;
	text-align:left;
	text-indent:0;
}
.chorus,
.refrain {
	font-style:italic;
	margin:1em auto;
	max-width:70%;
}
.chorus p,
.refrain p {
	margin:0 0 0 4em;
	text-align:left;
	text-indent:-2em;
}
/* verse line number */
.ln {
	color:gray;
	float:right;
	font-style:italic;
	font-size:.8em;
	margin:0 -2em 0 1em;
	text-align:right;
	text-indent:0;
}
/* NOTE: vln requires enclosing div to be position:relative */
span.vln {
	color:gray;
	font-size:.8em;
	position:absolute;
	top:auto;
	right:-1.5em;
	text-align:right;
}
/* ILLUSTRATIONS */
img	{
	border:none;
	max-width:100%;
}
figure,
.map,
.figure,
.plate,
.frontispiece,
.illustration {
	margin:1em auto;
	max-width:100%;
	text-align:center;
	text-indent:0;
}
figure, .figure, .figleft, .figright {
	font-family:sans-serif;
	font-size:.9em;
	text-align:center;
	text-indent:0;
}
figure p,
figcaption,
.map p,
.figure p,
.plate p,
.frontispiece p,
.illustration p {
	font-family:sans-serif;
	font-size:.9em;
	text-align:center!important;
	text-indent:0!important;
}
.ornament {
	margin:1em auto;
	max-width:100%;
	text-align:center!important;
}
.headpiece {
	margin:auto;
	max-width:100%;
	text-align:center!important;
}
.tailpiece {
	margin:1em auto;
	text-align:center!important;
	width:66%;
}
.initial {
	float:left;
	margin:0;
	padding:0 0.5em 0 0;
}
.figleft {
	float:left;
	margin:0;
	padding:.5em .5em 0 0;
}
.figright {
	float:right;
	margin:0;
	padding:.5em 0 0 .5em;
}
/* TABLES */
caption	{
	background-color:inherit;
	font-variant:small-caps;
	margin:1em auto;
}
table	{ margin:1em auto; }
th	{ font-weight:normal; } /* override broswer default */
table.tb1 {
background-color:#fcfff6; /* in case it overflows the margins */
	border:1px solid gray;
	border-collapse:collapse;
}
table.tb1 tr th {
	border-right:1px solid gray;
	border-bottom:1px solid gray;
	padding:0.5em;
}
table.tb1 tr td {
	border-right:1px dotted gray;
	border-bottom:1px dotted gray;
	padding:0.5em;
}
table.nb,
table.nb tr th,
table.nb tr td
{ border:none; }
/* table cell atoms */
.bt	{ border-top:1px solid gray!important; }
.br	{ border-right:1px solid gray!important; }
.bb	{ border-bottom:1px solid gray!important; }
.bl	{ border-left:1px solid gray!important; }
.vat	{ vertical-align:top; }
.vab	{ vertical-align:bottom; }
/* LISTS */
li	{ margin-top:.5em; }
ol p	{
	text-align:justify;
	text-indent:0;
	margin-bottom:1em;
}
ol.nv	{
	color:#999;
	font-style:italic;
}
ol.nv p {
	color:#000;
	text-align:left;
	font-style:normal;
}
ol.upper-roman { list-style-type:upper-roman; }
ol.lower-roman { list-style-type:lower-roman; }
ol.upper-alpha { list-style-type:upper-alpha; }
ol.lower-alpha { list-style-type:lower-alpha; }
ul.bracketed /* list with a left-side line */
{
	border-left:1px solid gray;
	list-style-type:none;
	padding-left:1em;
}
/* list without bullets */
ul.nb
{
	list-style-type:none;
	padding-left:2em;
}
ul.nb li
{
	padding-left:2em;
	text-indent:-2em!important;
}
ul.nb li ul
{
	padding-left:0;
	text-indent:-2em;
}
/* ugly kludge to align sidenotes on a list */
.nb span.sn { margin-left:-4em; }
.dropcap {
	text-indent:0;
}
.dropcap:first-letter {
	float:left;
	font-size:5em;
	line-height:90%;
	padding-right:2px;
}
.dropcap img {
	float:left;
}
/* Close each chapter etc. with a decorative Aldine Leaf */
#contents:after,
.preface:after,
.prologue:after,
.epilogue:after,
.introduction:after,
.canto:after,
.essay:after,
.chapter:after {
	content:"❦";
	display:block;
	font-family:Georgia, "DejaVu Sans";
	font-size:2em;
	margin-top:1em;
	margin-bottom:2em;
	text-align:center;
}
.antiqua {
	font-family:'Uncial Antiqua', Georgia, serif!important;
	font-variant:normal!important;
	font-size:140%;
}
.math {display:inline-block;line-height:110%;text-align:center;text-indent:0;vertical-align:middle; }
.ib {display:inline-block;text-indent:0;vertical-align:middle;}
/* here because make-mobi objects */
span[lang=ar] { font-size:larger; }
*[lang=la] { font-variant:small-caps; }
/* End of this style sheet */
/* META stuff */
@media screen {
.dochead {
	border-bottom:1px solid gray;
	margin:0 0 1em 0!important;
	text-align:center!important;
	}
.dochead h1 {
	color:gray;
	font-family:Helvetica, Verdana, sans-serif;
	font-size:1em;
	font-style:normal;
	font-weight:normal;
	margin:0!important;
	}
.dochead h2 {
	color:gray;
	font-family:Helvetica, Verdana, sans-serif;
	font-size:1em;
	font-style:normal;
	font-weight:normal;
	}
.docfoot {
	border-top:1px solid gray;
	padding-top:1em;
/*
	position:fixed;
	bottom:0;
	background-color:#333!important;
	margin-left:-4em;
	margin-bottom:0;
	padding:0 1em;
	text-align:center;
	width:39em;
	z-index:10;
*/
	color:#666;
	font-family:Verdana, sans-serif;
	margin-bottom:3em;
	text-align:center!important;
	}
.docfoot p {
	font-size:.7em;
	text-align:center!important;
	text-indent:0;
	}
.docfoot p a:visited { color:#666; }
.docfoot p a:hover { color:#f00; }
.nav a	{ display:none; }
.docinfo {
	color:#666;
	font-family:Verdana, sans-serif;
	font-size:.7em;
	text-align:center!important;
	}
.docinfo p {
	text-align:center!important;
	text-indent:0;
	}
.docinfo p a:visited { color:#666; }
.docinfo p a:hover { color:#f00; }

}

/* meta stuff */
/* Special for Copyright notice */
#copyright {
	margin:1em 0 50em;
	padding:1em;
	border:1px solid red;
}
#copyright p {
	margin-bottom:1em;
	text-indent:0;
}

The Original Lorem Ipsum

Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?

At vero eos et accusamus et iusto odio dignissimos ducimus, qui blanditiis praesentium voluptatum deleniti atque corrupti, quos dolores et quas molestias excepturi sint, obcaecati cupiditate non provident, similique sunt in culpa, qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio, cumque nihil impedit, quo minus id, quod maxime placeat, facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet, ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.

Fonts

It used to be the case that “available fonts” meant those available on the user’s computer. Since we don’t know what fonts the user has on their computer, or even which operating system they are using, ebook design was restricted to using only those fonts likely to be available on all platforms. That’s a bit limiting, although happily there are good fonts such as Georgia and Verdana that are widely available and look good, both on the screen and when printed.

Happily, we’ve recently seen the emergence of web fonts, which are fonts that can be automatically downloaded along with the html file, css style sheets, and javascript files.

Almendra

At vero eos et accusamus et iusto odio dignissimos ducimus, qui blanditiis praesentium voluptatum deleniti atque corrupti, quos dolores et quas molestias excepturi sint, obcaecati cupiditate non provident

Uncial Antiqua

similique sunt in culpa, qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio,

UnifrakturMaguntia

necessitatibus saepe eveniet, ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.

Carter One

cumque nihil impedit, quo minus id, quod maxime placeat, facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum

Crushed

Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt,

Dr Sugiyama

quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam

Limelight

explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum,

Orbitron

corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?

Oswald

quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam

Philosopher

explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum,

Pinyon Script

necessitatibus saepe eveniet, ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.

Playfair Display

quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam

Smythe

corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?

HTML5

HTML5 introduces a number of new features, which promises to simplify some things, and to make it possible to add richer experiences to web pages (including ebooks).

Unfortunately, it has taken some time for browsers to catch up, and we have only just (2014) reached a point where I can reliably assume that most users would have a browser (even Internet Explorer) which can cope with what HTML5 has to offer. I have very recently “upgraded” all of our books to HTML5, but have not gone as far as replacing all the structural DIVs with the new article, section, and header elements. And to be honest, I am not convinced that these add any particular value other than the supposed benefits of semantic search. However, it will be a trivial exercise to convert all the div elements into the appropriate new elements — next time I'm refreshing the 3300+ books in the collection.

Techniques

When OCR goes bad

OCR does a remarkable job converting page images into text, providing your image is of optimal quality. Nothing will save you when the original pages were of dismal quality; you’re going to have lots of errors, and it may not be worth attempting to create a text version from them.

But more often, I find that most pages are good, and the OCR result is therefore acceptable, with just a few pages where the entire page is corrupted. This is usually because the original scan skewed the page beyond the limits of the OCR software. In such a case, you may be able to rescue the text by de-skewing the image for a page and re-doing the OCR for that page.

  1. Download the original page scan (e.g. from archive.org)
  2. Straighten the page in an image edit program
  3. Run these commands (Linux command line):
    convert page.jpg page.tif
    tesseract page.tif page
  4. Insert page.txt into ebook, replacing the corrupted text.

Unicode Character Charts

I find it convenient, when editing, to have ready access to tables of various Unicode character sets. I can quickly find the character glyph I need and copy-paste it into the book I'm working on.

The glyphs in these tables are shown at twice normal size, for easy identification. They are not available in all fonts, so proceed with caution if not using Georgia. A ⚡ indicates the glyph is not present in the current font.

Punctuation:



… . . . ·
« » ‘ ’ “ ” † ‡ 〃¡ ¿ ¶ §
mspace = [ ]
French:



À Á Â Ç È É Ê Î Ô Û
à á â ç è é ê î ô û
Other Western European:





à Ā Ä Å Æ Ǣ Ë Ē Ì Í Ï Ī Ð Ñ Ò Ó Õ Ö Ō Œ Ø Ù Ú Ü Ū Ý Ȳ Þ ß
ã ä ā å æ ǣ ë ē ì í ï ī ð ñ ò ó õ ö ō œ ø ù ú ü ū ý ȳ þ ÿ
Greek:









Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν
Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
α β γ δ ε ζ η θ ι κ λ μ ν
ξ ο π ρ ς σ τ υ φ χ ψ ω
Ά Έ Ή Ί Ό Ύ Ώ
ΐ Ϊ Ϋ ά έ ή ί ΰ
ϊ ϋ ό ύ ώ
Ϗ ϐ ϑ ϒ ϓ ϔ ϕ ϖ ϗ
ϰ ϱ ϲ ϳ ϴ ϵ ϶
Currency: { ¢ £ ¤ ¥ €
Superscripts
and
Subscripts:




M ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ⁿ

M ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎
Maths:









¼ ⅓ ½ ¾
+ &plus; ± × ÷ √ ∛ ∜
= ≠ ≡ ≤ ≥ ¬
∀ ∁ ∂ ∃ ∄ ∅ ∆ ∇ ∈ ∉ ∊ ∋ ∌ ∍ ∝ ∞ ∧ ∨ ∩ ∪
∏ ∐ ∑ ∫

Large Braces



At vero eos et accusamus et iusto odio dignissimos ducimus,



right side
-->
Symbols:



™ © ®

🕱
Technical:

°
Arrows:



← ↑ → ↓
Weather:



Misc:






Religious
and
Astrological:













Games:

















🂡 🂢 🂣 🂤 🂥 🂦 🂧 🂨 🂩 🂪 🂫 🂬 🂭 🂮
🂱 🂲 🂳 🂴 🂵 🂶 🂷 🂸 🂹 🂺 🂻 🂼 🂽 🂾
🃁 🃂 🃃 🃄 🃅 🃆 🃇 🃈 🃉 🃊 🃋 🃌 🃍 🃎
🃑 🃒 🃓 🃔 🃕 🃖 🃗 🃘 🃙 🃚 🃛 🃜 🃝 🃞 🃟 🂠
Music:


𝄀 𝄁 𝄂 𝄃 𝄄 𝄅 𝄆 𝄇 𝄈 𝄡 𝄐 𝄑 𝄒 𝄞 𝄢 𝄫
Aldine Leaf:

Dingbats:














































Bracket pieces:




















Glossary

Alignment — the relationship between the text and the margins: text may be aligned to the left margin, the right margin, centered between the margins, or justified (adjusted with the addition of white space) to align with both left and right margins.

Foreword — an introductory essay written by someone other than the author.

Half Title — a recto page bearing the title of a book, preceding the title page proper.
In the early days of book publishing, books were left with the printed sheets folded but uncut until ordered, with an expectation that the purchaser would arrange binding. The half title was intended to provide a disposable page that would protect the title page from damage prior to binding.

Leading — the height of the line from the baseline to the baseline of the line above. The name derives from the small pieces of lead the printer inserted above letters to provide extra spacing between lines.

Leaf — a paper sheet making a page of a paper book, comprising the recto and verso sides.

Measure — the width of a line.

Preface — an introductory essay written by the author.

Prologue — an introduction to a tale, not written in the author’s voice.

Recto — the right-hand page of a paper book.

Verso — the left-hand page of a paper book; the reverse side of the recto page. Hence Titleverso, the verso of the title page.

References

HTML
HTML5, http://www.w3.org/TR/html5/

CSS
Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification, http://www.w3.org/TR/2011/REC-CSS2-20110607

Book Design
http://en.wikipedia.org/wiki/Book_design

Printing a book with CSS Boom!
http://www.alistapart.com/articles/boom

Project Gutenberg
http://gutenberg.org

Gutenberg Australia
http://gutenberg.net.au

Project Gutenberg Canada
http://gutenberg.ca

Distributed Proofreaders
http://www.pgdp.net/

Internet Archive
http://archive.org

Using semantic elements to mark up structure
http://www.w3.org/TR/WCAG20-TECHS/G115

This web edition published by:

eBooks@Adelaide
The University of Adelaide Library
University of Adelaide
South Australia 5005