Open Editions Specification 1.0

Each edition must:

The header.xml must:


Dialogue Attribution

Dialogue is marked up with <said> and using the @who attribute, as in <said who="Stephen Dedalus">.

If a character quotes direct speech within her speech, we’re encoding it like this:

<said who="Stephen Dedalus">―You said,</said> Stephen answered, 
<said who="Stephen Dedalus"><said who="Buck Mulligan" rend="italics">O, 
it's only Dedalus whose mother is beastly dead</said>.</said>

If direct speech is recalled in interior monologue or (occasionally) represented in the third-person narrative using italics, we’re encoding it like this:

she was one of those good souls who had always to be told twice 
<said who="Father Conmee" direct="false" rend="italics">bless you, my child,</said> 
that they have been absolved, <said who="Father Conmee" direct="false" rend="italics">pray for me</said>.

For more on this, see the discussion in issue #20.

When it is not clear who is speaking, you can mark up your guesses using the certainty tag, like this:

<lb n="060004"/><said xml:id="060004-a" who="Cunningham">―Come on, Simon.
<certainty target="#060004-a" match="@who" locus="value" assertedValue="Power" degree="0.5">
    <desc>It's unclear here whether it's Cunningham or Power speaking.</desc>

Distinctive Words

Distinctive words can be marked up using <distinct>, or <foreign>, depending on the language. If <foreign>, be sure to specify the language using xml:lang="en", where “en” is the two-letter language code.

For lengthy notes on a single word, it might be better to put these in a separate file. See the word notes in the Ulysses text for an example.

Here’s one generated from the database of

<note resp="Ronan Crowley" type="analysis" xml:id="151777-bedlock">
<analysis>The OED recognises “bedlock” n. as a nonce-word modelled, 
reasonably enough, after “wedlock.” As it happens, the coinage is 
not Joyce’s own. He found it in James Huneker’s Painted Veils (1920), 
which has “whether in wedlock or concubinage—bedlock is the ultimate outcome.”</analysis>
<time>2017-12-06 19:57:28</time>
<bibl>“Born out of bedlock hereditary epilepsy is present, the 
consequence of unbridled lust.” (U 15.1777–78)</bibl>

Note the format of the xml id, which points to line 151777, and the word “bedlock” appearing in that line.

Titles of Works

Although the <title> tag seems to be used for titles of works in the real world, they might also be applied to fictional works. The TEI documentation for <title> gives information about its attribute level, which can be used to distinguish between “levels” of publications such as journals, series, or monographs, but this would not cover all the types present in Ulysses, so maybe we could use the attribute type along with a few types:

Subtitles can be handled, following the TEI suggestions for @type, with <title type=”sub”>.

Titles are be rendered as italicized by default, so if a given title isn’t italicized in the text, mark it up with <title rend=”none”>.


Cross-references show intertextualities between two texts in our collection, or in the same text. For instance, here are a few repeating instances within Ulysses:

    <link xml:id="agenbite" target="u01_telemachus.xml#lb_010481_agenbite u09_scylla.xml#lb_090196_agenbite u09_scylla.xml#lb_090809 u10_wandering_rocks.xml#lb_100875 u10_wandering_rocks.xml#lb_100879"/>
    <note target="agenbite">
        Agenbite of inwit: Middle English for, "again bites the inner wit" (referring to the guilt of conscience). 
        Also the title of a confessional prose work written in a Kentish dialect of Middle English.
        <ref target=""></ref>
    <link xml:id="twig-skirt" target="u10_wandering_rocks.xml#lb_100201 u10_wandering_rocks.xml#lb_100440 u14_oxen.xml#lb_141158"/>
    <link xml:id="lemon-soap" target="u05_lotus-eaters.xml#lb_050513 u17_ithaca.xml#lb_170232"/>

Line Numbering

Whenever possible, we should maintain line numbers that keep the text even with its canonical edition. This is already done for Portrait, Ulysses, and Dubliners.

Place Names and Geotagging

Place names should be geotagged, using <location><geo> and latitude/longitude:

They were caught near the <place><location><geo>53.290670 -6.535060</geo></location><placeName>Hill of Lyons</placeName></place>.


Use of a secondary language should be noted, using the xml id.

<lb n="010005"/><said who="bm"><quote xml:lang="la">Introibo ad altare Dei.</quote></said></p>

Free Indirect Discourse

We’re still not quite sure how to mark this up. <seg type="fid"> will have to do, for now. Michelle Qiu has who attributes, as well.

<seg type="fid" who="#Celia"> Celia was conscious of some mental strength when she really applied herself to argument.</seg>


We link literary texts to criticism that discusses it. Some of this is done manually, and some automatically, using Text-matcher.

Markup details to follow.

Zenodo Archiving

We link our GitHub repositories to Zenodo for archiving. Zenodo pulls in our latest releases, and stores them at the CERN Data Centre. They also give them DOIs (document object identifiers), making them easier to cite.