Open Editions Specification 1.0
Each edition must:
- Be a single git repository, hosted on GitHub.
- Be named
authoris the author’s name, and
shortTitleis a shortened version of the title.
- Be included in this repository as a submodule (for now).
- Have a
header.xmlin TEI XML, containing all its metadata.
- Have a
README.mdfile describing the edition
- Have a
.github/workflows/edition.yamlfile which will validate the XML
- Have a mirror repo at Zenodo, where it will receive a DOI and have a stable identifier, for posterity.
- This requires that the edition have a release, and a tag. Use semantic versioning for the tags, e.g. v0.1.0.
The header.xml must:
- Be a valid TEI XML file, as validated by the official TEI RNG files.
- Contain a list of
xincludes to other files in the repository, if the edition is in more than one file.
- Contain links to the source repository, since the source repository is the authoritative source of information such as version history.
Dialogue is marked up with
<said> and using the
@who attribute, as in
<said who="Stephen Dedalus">.
- For the moment, full names should be used, where known. (Although in the future we hope to shift to XML IDs; see #19.
- You can check the list of characters currently used with
make chars, a command in the Makefile. (This requires that you have
ag(The Silver Searcher) installed, and of course GNU
make. See Chris Foster’s comment in issue 19.
If a character quotes direct speech within her speech, we’re encoding it like this:
If direct speech is recalled in interior monologue or (occasionally) represented in the third-person narrative using italics, we’re encoding it like this:
For more on this, see the discussion in issue #20.
When it is not clear who is speaking, you can mark up your guesses using the
certainty tag, like this:
Distinctive words can be marked up using
<foreign>, depending on the language. If
<foreign>, be sure to specify the language using
xml:lang="en", where “en” is the two-letter language code.
For lengthy notes on a single word, it might be better to put these in a separate file. See the word notes in the Ulysses text for an example.
Here’s one generated from the database of joycewords.com:
<note resp="Ronan Crowley" type="analysis" xml:id="151777-bedlock"> <term>bedlock<term> <analysis>The OED recognises “bedlock” n. as a nonce-word modelled, reasonably enough, after “wedlock.” As it happens, the coinage is not Joyce’s own. He found it in James Huneker’s Painted Veils (1920), which has “whether in wedlock or concubinage—bedlock is the ultimate outcome.”</analysis> <time>2017-12-06 19:57:28</time> <bibl>“Born out of bedlock hereditary epilepsy is present, the consequence of unbridled lust.” (U 15.1777–78)</bibl> </note>
Note the format of the xml id, which points to line 151777, and the word “bedlock” appearing in that line.
Titles of Works
<title> tag seems to be used for titles of works in the real world, they might also be applied to fictional works. The TEI documentation for
<title> gives information about its attribute
level, which can be used to distinguish between “levels” of publications such as journals, series, or monographs, but this would not cover all the types present in Ulysses, so maybe we could use the attribute
type along with a few types:
<title type="artwork">The Bath of the Nymph</title> over the bed.
smudged pages. <title type="book">Ruby: the Pride of the Ring.</title> Hello.
number of <title type="magazine">Photo Bits</title>
The <title type="newspaper">Evening Telegraph</title>
Dead March from <title type="oratorio">Saul</title>.
in the pantomime of <title type="pantomime">Turko the Terrible</title>
Could I go to see <title type="play">Leah</title> tonight, I wonder.
<title type="poem">Art thou real, my ideal?</title> it was called by Louis J Walsh, Magherafelt
- short story:
Our prize titbit: <title type="short story">Matcham’s Masterstroke</title>
<title type="song">Là ci darem</title> with J. C. Doyle, she said, and <title type="song">Love's Old Sweet Song</title>
Subtitles can be handled, following the TEI suggestions for
Titles are be rendered as italicized by default, so if a given title isn’t italicized in the text, mark it up with
Cross-references show intertextualities between two texts in our collection, or in the same text. For instance, here are a few repeating instances within Ulysses:
<link xml:id="agenbite" target="u01_telemachus.xml#lb_010481_agenbite u09_scylla.xml#lb_090196_agenbite u09_scylla.xml#lb_090809 u10_wandering_rocks.xml#lb_100875 u10_wandering_rocks.xml#lb_100879"/> <note target="agenbite"> Agenbite of inwit: Middle English for, "again bites the inner wit" (referring to the guilt of conscience). Also the title of a confessional prose work written in a Kentish dialect of Middle English. <ref target="https://en.wikipedia.org/wiki/Ayenbite_of_Inwyt"></ref> </note> <link xml:id="twig-skirt" target="u10_wandering_rocks.xml#lb_100201 u10_wandering_rocks.xml#lb_100440 u14_oxen.xml#lb_141158"/> <link xml:id="lemon-soap" target="u05_lotus-eaters.xml#lb_050513 u17_ithaca.xml#lb_170232"/>
Whenever possible, we should maintain line numbers that keep the text even with its canonical edition. This is already done for Portrait, Ulysses, and Dubliners.
Place Names and Geotagging
Place names should be geotagged, using
<location><geo> and latitude/longitude:
Use of a secondary language should be noted, using the xml id.
Free Indirect Discourse
We’re still not quite sure how to mark this up.
<seg type="fid"> will have to do, for now. Michelle Qiu has
who attributes, as well.
We link literary texts to criticism that discusses it. Some of this is done manually, and some automatically, using Text-matcher.
Markup details to follow.
We link our GitHub repositories to Zenodo for archiving. Zenodo pulls in our latest releases, and stores them at the CERN Data Centre. They also give them DOIs (document object identifiers), making them easier to cite.