Project Gutenberg Catalogue Project
This is an adjunct project to eBooks @ Adelaide, aiming to create a file of MARC records for items in Project Gutenberg.
PG makes available a catalog file in RDF format. This is parsed using a custom script I wrote, which converts each RDF record into a MARC21-format record.
Get the MARC records, split into groups of 1,000 records by PG identifier.
Download the COMPLETE MARC-format file (zip file format, ~5MB)
There is no simple one-to-one relationship between the RDF format (which uses Dublin Core) and MARC. Furthermore, the information recorded in the RDF records is sometimes eccentric. My script attempts to makes some intelligent guesses about the data, and sometimes this will sub-optimal.
Specific problems include:
The Main title sometimes has line breaks, in which case I've interpreted the first to mark the beginning of a subtitle, and any other line breaks I've ignored. This seems to work for most items, but there are anomalies. I've used a colon to separate title and subtitle, but sometimes a semi-colon might be better.
Ideally, one would have a single MARC record for the multi-volume works, rather than one record for each volume. That's definitely one for the "too hard basket" at this stage.
Having said that, I believe the MARC file produced to be of a standard which makes it usable in any Library catalogue.
I welcome any feedback about how this project may be improved.