Encoding guidelines for e-ditiones

2. The encoding

2.1. Headers encoding

First, we chose to identify two differents types of headers :

manuscripts
printed documents

Both headers are mostly the same : they contain a <fileDesc>, a <encodingDesc> and a <revisionDesc>. The only difference between the two of them is the addition of the <msDesc> used for the description of a manuscript.

2.1.1. The <fileDesc>

This part of the header contains at least five other parts :

the <titleStmt>
the <editionStmt>
the <extent>
the <publicationStmt>
the <sourceDesc>

2.1.1.1. The <titleStmt>

This part is essential for the presentation of the encoded document. It has to contain at least one <title> and one <author>.

<titleStmt> <title>Bérénice</title> <author>Jean Racine</author> </titleStmt>

2.1.1.2. The <editionStmt>

This part contains the name of the editor and the date of encoding.

<editionStmt> <edition> <date>30/04/2020</date> </edition> <respStmt> <persName>Simon Gabay</persName> <resp>Éditeur scientifique</resp> </respStmt> </editionStmt>

2.1.1.3. The <extent>

This part indicates the size of the work and contains the number of words and pages (considering that the number of pages equals the number of <pb>.

2.1.1.4. The <publicationStmt>

This part contains an <authority> element with the name of the project and an <availability> element with its status and the <licence>.

<publicationStmt> <authority>e-ditiones</authority> <availability status="restricted"> <licence target="https://creativecommons.org/licenses/by/4.0/">Attribution 4.0 International (CC BY 4.0)</licence> </availability> </publicationStmt>

2.1.1.5. The <sourceDesc>

This part contains one (or more) bibliographical description wich includes standards TEI elements such as <author>, <title> or <date>.

<sourceDesc> <bibl> <author>Jean Racine</author> <title>Oeuvres</title> <publisher>Jean Ribou</publisher> <pubPlace>Paris</pubPlace> <date when="1676">1676</date> <ptr target="https://gallica.bnf.fr/ark:/12148/bpt6k990581p"/> </bibl> </sourceDesc>

2.1.1.5.1. The <msDesc>

As already said, there is a particularity when the text encoded is a manuscript. To describe the document, we have to use the <msDesc> element. To ensure a good encoding, severals elements are recommanded :

<msIdentifier> which contains informations used to properly identify the manuscript
<msIdentifier> <country>Etats-Unis</country> <settlement>Princeton</settlement> <institution>Princeton University Library</institution> <repository>Manuscripts Division, Department of Rare Books and Special Collections</repository> <collection>John Hinsdale Scheide Collection of Three Centuries of French History</collection> <idno type="shelfmark">C0710, vol. 3</idno> </msIdentifier>
<msContent> which contains informations about the intellectual content of the manuscript
<physDesc>which contains informations about the physical description of the document such as the <objectDesc> or the <bindingDesc>
<physDesc> <objectDesc> <supportDesc> <support> <objectType rend="composite">composite repository</objectType> <material>papier.</material> </support> <extent> <measure unit="page" n="unk"/> </extent> <foliation>Pages aren't numebered</foliation> </supportDesc> </objectDesc> <bindingDesc> <binding> <p/> </binding> </bindingDesc> </physDesc>
<history> which contains informations about the history of the manuscript
<additional>which contains more informations about the document, such as <surrogates> or bibliographical informations (<bibl>)
<additional> <surrogates>  <graphic source="local" url="chemin"/> </surrogates> <listBibl> <listBibl type="L1"> <bibl>La Fayette, <title>Œuvres complètes</title>, C. Esmein-Sarrazin (éd.), Paris: Gallimard, lettre n°70-1.</bibl> </listBibl> </listBibl> </additional>
<msPart> which might contain the preceding elements is used to the description of a specific part of the encoded manuscript

2.1.2. The <encodingDesc>

This part describes the relationship between the encoded text and its source. It might contain :

<projectDesc> which describes the project
<editorialDecl> which contains informations about the editorial principles such as <correction>, <normalization> or <interpretation>

<encodingDesc> <projectDesc> <p>Creation of a NLP tools for 17th French</p> </projectDesc> <editorialDecl> <correction> <p>Very minor corrections, usually tagged.</p> </correction> <hyphenation> <p>Kept, encoded with <gi>c</gi> </p> </hyphenation> <normalization> <p>None</p> </normalization> <quotation> <p>Original</p> </quotation> <punctuation> <p>Original</p> </punctuation> <interpretation> <p>None</p> </interpretation> <segmentation> <p>Text is divided in <list> <item>sentences encoded with <gi>s</gi> </item> <item>sub-sentences encoded with <gi>seg</gi> (most or the time based on columns and semicolons)</item> </list> and </p> </segmentation> </editorialDecl> </encodingDesc>

2.1.3. The <revisionDesc>

This last part of the header contains informations about at least one <change> during the production of the document. when is used to specify the date of the event.

<revisionDesc> <change when="20200430">Add documentation</change> </revisionDesc>

2.2. Text encoding

After the OCR of the text, its encoding will be completed in three phases :

Level 1 : the encoding will distinguish form and content
Level 2 : we will add semantic informations
Level 3 : we will add linguistical informations

Please note that at each level, all existing elements are still used and new elements are added to the existing ones.

2.2.1. First level

The purpose of the first level is to distinguish between form and content. To do that, we chose to only use a few elements. First, at all levels our edition must contain a <text> element with the following namespace : @xmlns="http://www.tei-c.org/ns/1.0". It checks the validation of the TEI schema.

Then, at this level of encoding, all the text is included in the <body> and in a single <p>. Some informations are added at this point : concerning the content of the text, the element <fw> contains informations such as title, pagination or editor's notes. The other informations added are about the form of the text. We decided to employ the elements <pb> and <lb>. The first one, <pb>, which marks the point where a new page begins, is useful in the way that it can be used to check the transcription but also to compare our edition with a reference edition. The second one, <lb>, which marks the point where a new line begins, provides graphical informations and can be used for an automatic encoding process. It has two required attributes : break and rend. If a word is cut at the end of a line, break with the value "no" is useful in that the complete word can be establish again and be considere as a token. @rend shows which mark is used (a dash or an hyphen for example).

<text xml:id="EDI_0001"> <body> <lb/> <fw>A MONSEIGNEVR LE DVC D’ESPERNON. <lb/>Lettre I.</fw> <p> <lb/>Monseignevr, Quand ie ne ſerois pas nay cõme ie ſuis, voſtre <lb/>tres-humble ſeruiteur, il faudroit que ie fuſſe mauuais <lb/>François pour ne me reſioüir pas des contẽtemens de voſtre maiſon, <lb/>puis que ce ſont des felicités publiques.</p> </body> </text>

2.2.2. Second level

At this level of encoding, we add manually some semantic informations. Considering that we want to use, as mentioned before, a minimal set of elements, we decided to only employ common elements. Despite this, in the case of texts such as plays or letters, the use of a few specific elements is recommended.

2.2.2.1. Common elements

It is possible to use the following elements :

Element	Text type	Note
<front>	any prefatory matter
<div>	any text subdivision	type,n and xml:id are required
<back>	any type of appendix
<head>	any type of heading	this can be used to clarify <fw>
<list>and<item>	any type of list	n and xml:id are required
<orgName>, <persName> and <placeName>	any type of person, place or organisation	this can be useful for entity search
<l> and <lg>	any type of line or line group
<note>	any type of note	it can be used for a note by the autor, the editor or, rarely, added during the encoding

2.2.2.2. Specific elements

There are only two exceptions, drama and letters.

2.2.2.2.1. Drama

If the text encoded is a play, it is allowed to use three new elements :

Element	Text type	Note
<sp>	contains a speech	n and xml:id are required
<stage>	any stage direction	e.g. useful to study spoken words
<speaker>	any speaker in a speech

<text xml:id="EDI_0001"> <body> <div type="letter" xml:id="EDI_0001-1" n="1"> <head>A <persName>MONSEIGNEVR LE DVC D’ESPERNON</persName>. <lb/>Lettre I.</head> <p n="1" xml:id="EDI_0001-1-1"> <persName>Monseignevr</persName>, Quand ie ne ſerois pas nay cõme ie ſuis, voſtre tres-humble ſeruiteur, il faudroit que ie fuſſe mauuais François pour ne me reſioüir pas des contẽtemens de voſtre <orgName>maiſon</orgName>, puis que ce ſont des felicités publiques. <lb/>Nous auõs ſçeu l’heureux ſuccés du voyage que vous auez fait en <placeName>Bearn</placeName> </p> </div> </body> </text>

Example of a letter

<body> <div type="play" xml:id="EDI_0002-1" n="1"> <head> <lb/>L’ILLVSION <lb/>COMIQVE <lb/>COMEDIE</head> <div type="act" xml:id="EDI_0002-1-1" n="1"> <lb/> <head>ACTE PREMIER.</head> <div type="scene" xml:id="EDI_0002-1-1-1" n="1"> <lb/> <head>SCENE PREMIERE.</head> <lb/> <stage> <persName>PRIDAMANT</persName>, <persName>DORANTE</persName>.</stage> <lb/> <sp n="1" xml:id="EDI_0002-1-1-1-1"> <speaker>DORANTE.</speaker> <p n="1" xml:id="EDI_0002-1-1-1-1-1"> <lb/>CE grand Mage dont l'art commande <lb/>à la nature <lb/>N'a choiſi pour palais que cette grotte <lb/>obſcure; <lb/>La nuit qu'il entretient ſur cet af <lb break="no" rend="¬"/>freux ſeiour <lb/>N'ouurant ſon voile espais qu'aux raions d’vn <lb/>fauxiour, <fw> <lb/>A <lb/>2 L’ILLVSION COMIQ.</fw> <lb/>De leur eſclat douteux n'admet en ces lieux ſombres <lb/>Que ce qu'en peut ſouffrir le commerce des ombres. <lb/>N'auances pas, ſon art au pied de ce Rocher <lb/>A mis dequoy punir qui s'en oſe approcher, <lb/>Et cette large boucbe eſt vn mur inuiſible, <lb/>Ou l'air en ſa faueur deuient inacceßible, <lb/>Et luy fait vn rampart dont les funestes bords <lb/>Sur vn peu de poußiere eſtalent mille morts. <lb/>Ialoux de ſon repos plus que de ſa deffenſe <lb/>Il perd qui l'importune ainſi que qui l'offence, <lb/>Si bien que ceux qu'amene vn curieux deſir <lb/>Pour conſulter <persName>Alcandre</persName> attendent ſon loiſir, <lb/>Chaque iour il ſe monſtre, & nous touchons à l'heure <lb/>Que pour ſe diuertir il ſort de ſa demeure.</p> </sp> </div> </div> </div> </body>

Example of a speech

2.2.2.2.2. Letters

If the text encoded is a letter, it is allowed to use two new elements :

Element	Text type	Note
<opener>	any text at the start of a letter	e.g. a salutation or a dateline
<closer>	any text at the end of a letter	e.g. a salutation or a dateline

2.2.3. Third level

This level of encoding is automaticaly done. In order to add some linguistical informations, the original version of the text is normalized with the following elements : <choice>, <orig> and <reg>. Then, in order to process tokenization and lemmatization on the text, we decided to split it with <seg> and <w>. The first one, <seg> is used to represent any segmentation of the text. Note that sentences and clauses remain our basic units but we recommand to split a long sentence in several segments. The <w> is used to mark a single token. Regarding ponctuation, we decided to consider the marks as tokens; first, because more precision wouldn't be useful for our analyse and second, because with this choice, our encoding remains compatible with ELTeC.

<p n="1" xml:id="EDI_0002-1-1-1-1-1"> <choice> <orig> <seg> <w>N</w> <w>'</w> <w>a</w> <w>choiſi</w> <w>pour</w> <w>palais</w> <w>que</w> <w>cette</w> <w>grotte</w> </seg> </orig> <reg> <w>N</w> <w>'</w> <w>a</w> <w>choisi</w> <w>pour</w> <w>palais</w> <w>que</w> <w>cette</w> <w>grotte</w> </reg> </choice> </p>

2.3. The attributes

We decided to define a closed of attributes that can be used for the encoding. There are only three of them :

xml:id
n
type

Please note that all of them are required.

2.3.1. xml:id

This attribute is used to identify the document or its subdivisions. Earlier in this document, we presented the way to properly generate identifiers.

xml:id is required on several elements and a diffetent levels :

Element	Level of encoding
<text>	all levels
<div>	levels 2 and 3
<p>	levels 2 and 3
<sp>	levels 2 and 3
<lg>	levels 2 and 3
<l>	levels 2 and 3

2.3.2. n

This attribute is used to identify the numbering of its element from the second level. Node children elements are numbered incrementaly starting with 1.

Note that there are two exceptions :

<pb> : numbering starts at the beginning of the edition and continues until its end
<l> : numbering (re)starts at the beginning of each page

Note: In this way, it's possible to compare our edition with an reference edition.

Elements	Numbering starts at :
<div>	parent node
<sp>	parent node
<p>	parent node
<lg>	parent node
<l>	each new page
<pb>	beginning of the edition

2.3.3. type

This attribute is used to specify the type of the current <div>.

Note that for this attribute, the use of predefined values is restricted.

Value	Usecase
titlePage	in the <front>, used for the title page of the work
privilege	in the <front>, used for the privilege of the work
castList	in the <front>, used for the cast list
liminal	in the <front>, used for any liminal part of the work
play	used at the beginning of a new play
act	used at the beginning of a new act
scene	used at the beginning of a new scene
part	used for any part of the work
subPart	used for any subpart (child of a type="part") of the work
letter	used for any letter
collection	used for any type of collection

2.3.4. Recap table for attributes

	<text>	<div>	<lg>	<l>	<sp>	<p>	<pb>
xml:id	required	required	required	required	required	required	not required
n	not required	required	required	required	required	required	required
type	not required	required	not required	not required	not required	not required	not required

Appendix A Encoding specifications

Appendix A.1 Elements

Appendix A.1.1 <body>

<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure]
Module	textstructure
Attributes	Attributes att.declaring (@decls) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Contained by	textstructure: text
May contain	core: lb p pb transcr: fw
Example	<body> <l>Nu scylun hergan hefaenricaes uard</l> <l>metudæs maecti end his modgidanc</l> <l>uerc uuldurfadur sue he uundra gihuaes</l> <l>eci dryctin or astelidæ</l> <l>he aerist scop aelda barnum</l> <l>heben til hrofe haleg scepen.</l> <l>tha middungeard moncynnæs uard</l> <l>eci dryctin æfter tiadæ</l> <l>firum foldu frea allmectig</l> <trailer>primo cantauit Cædmon istud carmen.</trailer> </body>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divTop"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divTop"/> </alternate> </sequence> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divGenLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.common"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <alternate minOccurs="0" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> </alternate> </sequence> </alternate> <sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottom"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </content>
Schema Declaration	element body { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.declaring.attributes, ( model.global, ( model.divTop, ( model.global \| model.divTop ) )?, ( model.divGenLike, ( model.global \| model.divGenLike )* )?, ( ( model.divLike, ( model.global \| model.divGenLike )* )+ \| ( model.div1Like, ( model.global \| model.divGenLike )* )+ \| ( ( model.common, model.global* )+, ( ( model.divLike, ( model.global \| model.divGenLike )* )+ \| ( model.div1Like, ( model.global \| model.divGenLike )* )+ )? ) ), ( model.divBottom, model.global* )* ) }

Appendix A.1.2 <choice>

<choice> groups a number of alternative encodings for the same point in a text. [3.4. Simple Editorial Changes]
Module	core
Attributes	Attributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of	model.pPart.editorial
Contained by	core: choice corr hi p sic transcr: fw
May contain	core: choice corr sic
Note	Because the children of a <choice> element all represent alternative ways of encoding the same sequence, it is natural to think of them as mutually exclusive. However, there may be cases where a full representation of a text requires the alternative encodings to be considered as parallel. Note also that <choice> elements may self-nest. Where the purpose of an encoding is to record multiple witnesses of a single work, rather than to identify multiple possible encoding decisions at a given point, the <app> element and associated elements discussed in section 12.1. The Apparatus Entry, Readings, and Witnesses should be preferred.
Example	An American encoding of Gulliver's Travels which retains the British spelling but also provides a version regularized to American spelling might be encoded as follows. <p>Lastly, That, upon his solemn oath to observe all the above articles, the said man-mountain shall have a daily allowance of meat and drink sufficient for the support of <choice> <sic>1724</sic> <corr>1728</corr> </choice> of our subjects, with free access to our royal person, and other marks of our <choice> <orig>favour</orig> <reg>favor</reg> </choice>.</p>
Content model	<content> <alternate minOccurs="2" maxOccurs="unbounded"> <classRef key="model.choicePart"/> <elementRef key="choice"/> </alternate> </content>
Schema Declaration	element choice { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, ( model.choicePart \| choice )+ }

Appendix A.1.3 <corr>

<corr> (correction) contains the correct form of a passage apparently erroneous in the copy text. [3.4.1. Apparent Errors]
Module	core
Attributes	Attributes att.editLike (@instant) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of	model.choicePart model.pPart.transcriptional
Contained by	core: choice corr hi p sic transcr: fw
May contain	core: choice corr hi lb pb sic transcr: fw character data
Example	If all that is desired is to call attention to the fact that the copy text has been corrected, <corr> may be used alone: I don't know, Juan. It's so far in the past now — how <corr>can we</corr> prove or disprove anyone's theories?
Example	It is also possible, using the <choice> and <sic> elements, to provide an uncorrected reading: I don't know, Juan. It's so far in the past now — how <choice> <sic>we can</sic> <corr>can we</corr> </choice> prove or disprove anyone's theories?
Content model	<content> <macroRef key="macro.paraContent"/> </content>
Schema Declaration	element corr { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.editLike.attributes, macro.paraContent }

Appendix A.1.4 <fw>

<fw> (forme work) contains a running head (e.g. a header, footer), catchword, or similar material appearing on the current page. [11.6. Headers, Footers, and Similar Matter]
Module	transcr
Attributes	Attributes att.placement (@place) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of	model.milestoneLike
Contained by	core: corr hi p sic textstructure: body text transcr: fw
May contain	core: choice corr hi lb pb sic transcr: fw character data
Note	Where running heads are consistent throughout a chapter or section, it is usually more convenient to relate them to the chapter or section, e.g. by use of the rend attribute. The <fw> element is intended for cases where the running head changes from page to page, or where details of page layout and the internal structure of the running heads are of paramount importance.
Example	<fw type="sig" place="bottom">C3</fw>
Content model	<content> <macroRef key="macro.phraseSeq"/> </content>
Schema Declaration	element fw { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.placement.attributes, att.written.attributes, macro.phraseSeq }

Appendix A.1.5 <hi>

<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language]
Module	core
Attributes	Attributes att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of	model.hiLike
Contained by	core: corr hi p sic transcr: fw
May contain	core: choice corr hi lb pb sic transcr: fw character data
Example	<hi rend="gothic">And this Indenture further witnesseth</hi> that the said <hi rend="italic">Walter Shandy</hi>, merchant, in consideration of the said intended marriage ...
Content model	<content> <macroRef key="macro.paraContent"/> </content>
Schema Declaration	element hi { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.written.attributes, macro.paraContent }

Appendix A.1.6 <lb>

<lb> (line beginning) marks the beginning of a new (typographic) line in some edition or version of a text. [3.10.3. Milestone Elements 7.2.5. Speech Contents]
Module	core
Attributes	Attributes att.edition (@ed, @edRef) att.spanning (@spanTo) att.breaking (@break) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of	model.milestoneLike
Contained by	core: corr hi p sic textstructure: body text transcr: fw
May contain	Empty element
Note	By convention, <lb> elements should appear at the point in the text where a new line starts. The n attribute, if used, indicates the number or other value associated with the text between this point and the next <lb> element, typically the sequence number of the line within the page, or other appropriate unit. This element is intended to be used for marking actual line breaks on a manuscript or printed page, at the point where they occur; it should not be used to tag structural units such as lines of verse (for which the <l> element is available) except in circumstances where structural units cannot otherwise be marked. The type attribute may be used to characterize the line break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the line break is word-breaking, or to note the source from which it derives.
Example	This example shows typographical line breaks within metrical lines, where they occur at different places in different editions: <l>Of Mans First Disobedience,<lb ed="1674"/> and<lb ed="1667"/> the Fruit</l> <l>Of that Forbidden Tree, whose<lb ed="1667 1674"/> mortal tast</l> <l>Brought Death into the World,<lb ed="1667"/> and all<lb ed="1674"/> our woe,</l>
Example	This example encodes typographical line breaks as a means of preserving the visual appearance of a title page. The break attribute is used to show that the line break does not (as elsewhere) mark the start of a new word. <titlePart> <lb/>With Additions, ne-<lb break="no"/>ver before Printed. </titlePart>
Content model	<content> <empty/> </content>
Schema Declaration	element lb { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.edition.attributes, att.spanning.attributes, att.breaking.attributes, empty }

Appendix A.1.7 <p>

<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents]

Module

core

Attributes

Attributes att.declaring (@decls) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)

part

specifies whether or not its parent element is fragmented in some way, typically by some other overlapping structure: for example a speech which is divided between two or more verse stanzas, a paragraph which is split across a page division, a verse line which is divided between two speakers.

Derived from	att.fragmentable
Status	Optional
Datatype	teidata.enumerated
Legal values are:	N [Default]

Member of

model.pLike

Contained by

textstructure: body

May contain

core: choice corr hi lb pb sic

transcr: fw

character data

Example

<p>Hallgerd was outside. <q>There is blood on your axe,</q> she said. <q>What have you done?</q> </p> <p> <q>I have now arranged that you can be married a second time,</q> replied Thjostolf. </p> <p> <q>Then you must mean that Thorvald is dead,</q> she said. </p> <p> <q>Yes,</q> said Thjostolf. <q>And now you must think up some plan for me.</q> </p>

Schematron

Schematron

<s:report test="ancestor::tei:l[not(.//tei:note//tei:p[. = current()])]"> Abstract model violation: Lines may not contain higher-level structural elements such as div, p, or ab. </s:report>

Content model

<content>
 <macroRef key="macro.paraContent"/>
</content>

Schema Declaration

element p
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.declaring.attributes,
   att.written.attributes,
   attribute part { "N" }?,
   macro.paraContent
}

Appendix A.1.8 <pb>

<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.10.3. Milestone Elements]
Module	core
Attributes	Attributes att.edition (@ed, @edRef) att.spanning (@spanTo) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.facs (@facs) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of	model.milestoneLike
Contained by	core: corr hi p sic textstructure: body text transcr: fw
May contain	Empty element
Note	A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself. The type attribute may be used to characterize the page break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the page break is word-breaking, or to note the source from which it derives.
Example	Page numbers may vary in different editions of a text. <p> ... <pb n="145" ed="ed2"/> <!-- Page 145 in edition "ed2" starts here --> ... <pb n="283" ed="ed1"/> <!-- Page 283 in edition "ed1" starts here--> ... </p>
Example	A page break may be associated with a facsimile image of the page it introduces by means of the facs attribute <body> <pb n="1" facs="page1.png"/> <!-- page1.png contains an image of the page; the text it contains is encoded here --> <p> <!-- ... --> </p> <pb n="2" facs="page2.png"/> <!-- similarly, for page 2 --> <p> <!-- ... --> </p> </body>
Content model	<content> <empty/> </content>
Schema Declaration	element pb { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.facs.attribute.facs, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.edition.attributes, att.spanning.attributes, empty }

Appendix A.1.9 <sic>

<sic> (Latin for thus or so) contains text reproduced although apparently incorrect or inaccurate. [3.4.1. Apparent Errors]
Module	core
Attributes	Attributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of	model.choicePart model.pPart.transcriptional
Contained by	core: choice corr hi p sic transcr: fw
May contain	core: choice corr hi lb pb sic transcr: fw character data
Example	for his nose was as sharp as a pen, and <sic>a Table</sic> of green fields.
Example	If all that is desired is to call attention to the apparent problem in the copy text, <sic> may be used alone: I don't know, Juan. It's so far in the past now — how <sic>we can</sic> prove or disprove anyone's theories?
Example	It is also possible, using the <choice> and <corr> elements, to provide a corrected reading: I don't know, Juan. It's so far in the past now — how <choice> <sic>we can</sic> <corr>can we</corr> </choice> prove or disprove anyone's theories?
Example	for his nose was as sharp as a pen, and <choice> <sic>a Table</sic> <corr>a' babbld</corr> </choice> of green fields.
Content model	<content> <macroRef key="macro.paraContent"/> </content>
Schema Declaration	element sic { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, macro.paraContent }

Appendix A.1.10 <text>

<text> contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text]
Module	textstructure
Attributes	Attributes att.declaring (@decls) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Contained by	—
May contain	core: lb pb textstructure: body transcr: fw
Note	This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose.
Example	<text> <front> <docTitle> <titlePart>Autumn Haze</titlePart> </docTitle> </front> <body> <l>Is it a dragonfly or a maple leaf</l> <l>That settles softly down upon the water?</l> </body> </text>
Example	The body of a text may be replaced by a group of nested texts, as in the following schematic: <text> <front> <!-- front matter for the whole group --> </front> <group> <text> <!-- first text --> </text> <text> <!-- second text --> </text> </group> </text>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <elementRef key="front"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="body"/> <elementRef key="group"/> </alternate> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <elementRef key="back"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </content>
Schema Declaration	element text { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.declaring.attributes, att.written.attributes, ( model.global, ( front, model.global )?, ( body \| group ), model.global, ( back, model.global )? ) }

Appendix A.2 Model classes

Appendix A.2.1 model.choicePart

model.choicePart groups elements (other than <choice> itself) which can be used within a <choice> alternation.
Module	tei
Used by	choice
Members	corr sic

Appendix A.2.2 model.common

model.common groups common chunk- and inter-level elements.
Module	tei
Used by	body
Members	model.divPart[model.lLike model.pLike[p]] model.inter[model.biblLike model.egLike model.labelLike model.listLike model.oddDecl model.qLike[model.quoteLike] model.stageLike]
Note	This class defines the set of chunk- and inter-level elements; it is used in many content models, including those for textual divisions.

Appendix A.2.3 model.divBottom

model.divBottom groups elements appearing at the end of a text division.
Module	tei
Used by	body
Members	model.divBottomPart model.divWrapper

Appendix A.2.4 model.divPart

model.divPart groups paragraph-level elements appearing directly within divisions.
Module	tei
Used by	model.common
Members	model.lLike model.pLike[p]
Note	Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items.

Appendix A.2.5 model.divTop

model.divTop groups elements appearing at the beginning of a text division.
Module	tei
Used by	body
Members	model.divTopPart[model.headLike] model.divWrapper

Appendix A.2.6 model.divTopPart

model.divTopPart groups elements which can occur only at the beginning of a text division.
Module	tei
Used by	model.divTop
Members	model.headLike

Appendix A.2.7 model.global

model.global groups elements which may appear at any point within a TEI text.
Module	tei
Used by	body macro.paraContent macro.phraseSeq text
Members	model.global.edit model.global.meta model.milestoneLike[fw lb pb] model.noteLike

Appendix A.2.8 model.hiLike

model.hiLike groups phrase-level elements which are typographically distinct but to which no specific function can be attributed.
Module	tei
Used by	model.highlighted model.limitedPhrase
Members	hi

Appendix A.2.9 model.highlighted

model.highlighted groups phrase-level elements which are typographically distinct.
Module	tei
Used by	model.phrase
Members	model.emphLike model.hiLike[hi]

Appendix A.2.10 model.inter

model.inter groups elements which can appear either within or between paragraph-like elements.
Module	tei
Used by	macro.paraContent model.common
Members	model.biblLike model.egLike model.labelLike model.listLike model.oddDecl model.qLike[model.quoteLike] model.stageLike

Appendix A.2.11 model.limitedPhrase

model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources.
Module	tei
Used by
Members	model.emphLike model.hiLike[hi] model.pPart.data[model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]] model.pPart.editorial[choice] model.pPart.msdesc model.phrase.xml model.ptrLike

Appendix A.2.12 model.milestoneLike

model.milestoneLike groups milestone-style elements used to represent reference systems.
Module	tei
Used by	model.global
Members	fw lb pb

Appendix A.2.13 model.nameLike

model.nameLike groups elements which name or refer to a person, place, or organization.
Module	tei
Used by	model.pPart.data
Members	model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]
Note	A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc.

Appendix A.2.14 model.pLike

model.pLike groups paragraph-like elements.
Module	tei
Used by	model.divPart
Members	p

Appendix A.2.15 model.pPart.data

model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data.
Module	tei
Used by	model.limitedPhrase model.phrase
Members	model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]

Appendix A.2.16 model.pPart.edit

model.pPart.edit groups phrase-level elements for simple editorial correction and transcription.
Module	tei
Used by	model.phrase
Members	model.pPart.editorial[choice] model.pPart.transcriptional[corr sic]

Appendix A.2.17 model.pPart.editorial

model.pPart.editorial groups phrase-level elements for simple editorial interventions that may be useful both in transcribing and in authoring.
Module	tei
Used by	model.limitedPhrase model.pPart.edit
Members	choice

Appendix A.2.18 model.pPart.transcriptional

model.pPart.transcriptional groups phrase-level elements used for editorial transcription of pre-existing source materials.
Module	tei
Used by	model.pPart.edit
Members	corr sic

Appendix A.2.19 model.phrase

model.phrase groups elements which can occur at the level of individual words or phrases.
Module	tei
Used by	macro.paraContent macro.phraseSeq
Members	model.graphicLike model.highlighted[model.emphLike model.hiLike[hi]] model.lPart model.pPart.data[model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]] model.pPart.edit[model.pPart.editorial[choice] model.pPart.transcriptional[corr sic]] model.pPart.msdesc model.phrase.xml model.ptrLike model.segLike model.specDescLike
Note	This class of elements can occur within paragraphs, list items, lines of verse, etc.

Appendix A.2.20 model.placeStateLike

model.placeStateLike groups elements which describe changing states of a place.
Module	tei
Used by	model.nameLike
Members	model.placeNamePart

Appendix A.2.21 model.qLike

model.qLike groups elements related to highlighting which can appear either within or between chunk-level elements.
Module	tei
Used by	macro.phraseSeq model.inter
Members	model.quoteLike

Appendix A.3 Attribute classes

Appendix A.3.1 att.breaking

att.breaking provides an attribute to indicate whether or not the element concerned is considered to mark the end of an orthographic token in the same way as whitespace.

Module

tei

Members

lb pb

Attributes

break

indicates whether or not the element bearing this attribute should be considered to mark the end of an orthographic token in the same way as whitespace.

Status	Recommended
Datatype	teidata.enumerated
Sample values include	yes the element bearing this attribute is considered to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace no the element bearing this attribute is considered not to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace maybe the encoding does not take any position on this issue.
In the following lines from the Dream of the Rood, linebreaks occur in the middle of the words lāðost and reord-berendum. <ab> ...eƿesa tome iu icƿæs ȝeƿorden ƿita heardoſt . leodum la<lb break="no"/> ðost ærþan ichim lifes ƿeȝ rihtne ȝerymde reord be<lb break="no"/> rendum hƿæt me þaȝeƿeorðode ƿuldres ealdor ofer... </ab>

Appendix A.3.2 att.declaring

att.declaring provides attributes for elements which may be independently associated with a particular declarable element within the header, thus overriding the inherited default for that element.

Module

tei

Members

body p text

Attributes

decls

identifies one or more declarable elements within the header, which are understood to apply to the element bearing this attribute and its content.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Note

The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 15.3. Associating Contextual Information with a Text.

Appendix A.3.3 att.editLike

att.editLike provides attributes describing the nature of an encoded scholarly intervention or interpretation of any kind.

Module

tei

Members

corr

Attributes

instant

indicates whether this is an instant revision or not.

Status	Optional
Datatype	teidata.xTruthValue
Default	false

Note

The members of this attribute class are typically used to represent any kind of editorial intervention in a text, for example a correction or interpretation, or to date or localize manuscripts etc.

Each pointer on the source (if present) corresponding to a witness or witness group should reference a bibliographic citation such as a <witness>, <msDesc>, or <bibl> element, or another external bibliographic citation, documenting the source concerned.

Appendix A.3.4 att.edition

att.edition provides attributes identifying the source edition from which some encoded feature derives.

Module

tei

Members

lb pb

Attributes

(edition) supplies a sigil or other arbitrary identifier for the source edition in which the associated feature (for example, a page, column, or line break) occurs at this point in the text.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace

edRef

(edition reference) provides a pointer to the source edition in which the associated feature (for example, a page, column, or line break) occurs at this point in the text.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Example

<l>Of Mans First Disobedience,<lb ed="1674"/> and<lb ed="1667"/> the Fruit</l> <l>Of that Forbidden Tree, whose<lb ed="1667 1674"/> mortal tast</l> <l>Brought Death into the World,<lb ed="1667"/> and all<lb ed="1674"/> our woe,</l>

Example

<listBibl> <bibl xml:id="stapledon1937"> <author>Olaf Stapledon</author>, <title>Starmaker</title>, <publisher>Methuen</publisher>, <date>1937</date> </bibl> <bibl xml:id="stapledon1968"> <author>Olaf Stapledon</author>, <title>Starmaker</title>, <publisher>Dover</publisher>, <date>1968</date> </bibl> </listBibl> <p>Looking into the future aeons from the supreme moment of the cosmos, I saw the populations still with all their strength maintaining the<pb n="411" edRef="#stapledon1968"/>essentials of their ancient culture, still living their personal lives in zest and endless novelty of action, … I saw myself still preserving, though with increasing difficulty, my lucid con-<pb n="291" edRef="#stapledon1937"/>sciousness;</p>

Appendix A.3.5 att.global.change

att.global.change supplies the change attribute, allowing its member elements to specify one or more states or revision campaigns with which they are associated.

Module

transcr

Members

att.global[body choice corr fw hi lb p pb sic text]

Attributes

change

points to one or more <change> elements documenting a state or revision campaign to which the element bearing this attribute and its children have been assigned by the encoder.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Appendix A.3.6 att.global.facs

att.global.facs provides an attribute used to express correspondence between an element containing transcribed text and all or part of an image representing that text.

Module

transcr

Members

att.global[body choice corr fw hi lb p pb sic text]

Attributes

facs

(facsimile) points to all or part of an image which corresponds with the content of the element.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Appendix A.3.7 att.global.rendition

att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme.

Module

tei

Members

att.global[body choice corr fw hi lb p pb sic text]

Attributes

rend

(rendition) indicates how the element in question was rendered or presented in the source text.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace
<head rend="align(center) case(allcaps)"> <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/> <hi rend="case(mixed)">New Blazing-World</hi>. </head>
Note	These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines. The values of the rend attribute are a set of sequence-indeterminate individual tokens separated by whitespace.

style

contains an expression in some formal style definition language which defines the rendering or presentation used for this element in the source text

Status	Optional
Datatype	teidata.text
<head style="text-align: center; font-variant: small-caps"> <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/> <hi style="font-variant: normal">New Blazing-World</hi>. </head>
Note	Unlike the attribute values of rend, which uses whitespace as a separator, the style attribute may contain whitespace. This attribute is intended for recording inline stylistic information concerning the source, not any particular output. The formal language in which values for this attribute are expressed may be specified using the <styleDefDecl> element in the TEI header. If style and rendition are both present on an element, then style overrides or complements rendition. style should not be used in conjunction with rend, because the latter does not employ a formal style definition language.

rendition

points to a description of the rendering or presentation used for this element in the source text.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
<head rendition="#ac #sc"> <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/> <hi rendition="#normal">New Blazing-World</hi>. </head> <rendition xml:id="sc" scheme="css">font-variant: small-caps</rendition> <rendition xml:id="normal" scheme="css">font-variant: normal</rendition> <rendition xml:id="ac" scheme="css">text-align: center</rendition>
Note	The rendition attribute is used in a very similar way to the class attribute defined for XHTML but with the important distinction that its function is to describe the appearance of the source text, not necessarily to determine how that text should be presented on screen or paper. If rendition is used to refer to a style definition in a formal language like CSS, it is recommended that it not be used in conjunction with rend. Where both rendition and rend are supplied, the latter is understood to override or complement the former. Each URI provided should indicate a <rendition> element defining the intended rendition in terms of some appropriate style language, as indicated by the scheme attribute.

Appendix A.3.8 att.global.responsibility

att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text, the markup or something asserted by the markup, and the degree of certainty associated with it.

Module

tei

Members

att.global[body choice corr fw hi lb p pb sic text]

Attributes

cert

(certainty) signifies the degree of certainty associated with the intervention or interpretation.

Status	Optional
Datatype	teidata.probCert

resp

(responsible party) indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
Note	To reduce the ambiguity of a resp pointing directly to a person or organization, we recommend that resp be used to point not to an agent (<person> or <org>) but to a <respStmt>, <author>, <editor> or similar element which clarifies the exact role played by the agent. Pointing to multiple <respStmt>s allows the encoder to specify clearly each of the roles played in part of a TEI file (creating, transcribing, encoding, editing, proofing etc.).

Example

Blessed are the <choice> <sic>cheesemakers</sic> <corr resp="#editor" cert="high">peacemakers</corr> </choice>: for they shall be called the children of God.

Example

<lg> <l>Punkes, Panders, baſe extortionizing sla<choice> <sic>n</sic> <corr resp="#JENS1_transcriber">u</corr> </choice>es,</l> </lg> <respStmt xml:id="JENS1_transcriber"> <resp when="2014">Transcriber</resp> <name>Janelle Jenstad</name> </respStmt>

Appendix A.3.9 att.global.source

att.global.source provides an attribute used by elements to point to an external source.

Module

tei

Members

att.global[body choice corr fw hi lb p pb sic text]

Attributes

source

specifies the source from which some aspect of this element is drawn.

Status

Optional

Datatype

1–∞ occurrences of teidata.pointer separated by whitespace

Note

The source attribute points to an external source. When used on elements describing schema components such as <schemaSpec> or <moduleRef> it identifies the source from which declarations for the components of the object being defined may be obtained.

On other elements it provides a pointer to the bibliographical source from which a quotation or citation is drawn.

In either case, the location may be provided using any form of URI, for example an absolute URI, a relative URI, or private scheme URI that is expanded to an absolute URI as documented in a <prefixDef>.

If more than one location is specified, the default assumption is that the required source should be obtained by combining the resources indicated.

Example

<p> As Willard McCarty (<bibl xml:id="mcc_2012">2012, p.2</bibl>) tells us, <quote source="#mcc_2012">‘Collaboration’ is a problematic and should be a contested term.</quote> </p>

Example

<p> <quote source="#chicago_15_ed">Grammatical theories are in flux, and the more we learn, the less we seem to know.</quote> </p> <bibl xml:id="chicago_15_ed"> <title level="m">The Chicago Manual of Style</title>, <edition>15th edition</edition>. <pubPlace>Chicago</pubPlace>: <publisher>University of Chicago Press</publisher> (<date>2003</date>), <biblScope unit="page">p.147</biblScope>. </bibl>

Example

Include in the schema an element named <p> available from the TEI P5 2.0.1 release.

Example

Create a schema using components taken from the file mycompiledODD.xml.

Appendix A.3.10 att.placement

att.placement provides attributes for describing where on the source page or object a textual element appears.

Module

tei

Members

Attributes

place

specifies where this item is placed.

Status	Recommended
Datatype	1–∞ occurrences of teidata.enumerated separated by whitespace
Suggested values include:	below below the line bottom at the foot of the page margin in the margin (left, right, or both) top at the top of the page opposite on the opposite, i.e. facing, page overleaf on the other side of the leaf above above the line end at the end of e.g. chapter or volume. inline within the body of the text. inspace in a predefined space, for example left by an earlier scribe.
<add place="margin">[An addition written in the margin]</add> <add place="bottom opposite">[An addition written at the foot of the current page and also on the facing page]</add>
<note place="bottom">Ibid, p.7</note>

Appendix A.3.11 att.spanning

att.spanning provides attributes for elements which delimit a span of text by pointing mechanisms rather than by enclosing it.

Module

tei

Members

lb pb

Attributes

spanTo

indicates the end of a span initiated by the element bearing this attribute.

Status	Optional
Datatype	teidata.pointer
Schematron	The @spanTo attribute must point to an element following the current element <sch:rule context="tei:[@spanTo]"> <sch:assert test="id(substring(@spanTo,2)) and following::[@xml:id=substring(current()/@spanTo,2)]">The element indicated by @spanTo (<sch:value-of select="@spanTo"/>) must follow the current element <sch:name/> </sch:assert> </sch:rule>

Note

The span is defined as running in document order from the start of the content of the pointing element to the end of the content of the element pointed to by the spanTo attribute (if any). If no value is supplied for the attribute, the assumption is that the span is coextensive with the pointing element. If no content is present, the assumption is that the starting point of the span is immediately following the element itself.

Appendix A.3.12 att.written

att.written provides an attribute to indicate the hand in which the content of an element was written in the source being transcribed.

Module

tei

Members

fw hi p text

Attributes

hand

points to a <handNote> element describing the hand considered responsible for the content of the element concerned.

Status	Optional
Datatype	teidata.pointer

Appendix A.4 Macros

Appendix A.4.1 macro.paraContent

macro.paraContent (paragraph content) defines the content of paragraphs and similar elements.
Module	tei
Used by	corr hi p sic
Content model	<content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.global"/> <elementRef key="lg"/> <classRef key="model.lLike"/> </alternate> </content>
Declaration	macro.paraContent = ( text \| model.gLike \| model.phrase \| model.inter \| model.global \| lg \| model.lLike )*

Appendix A.4.2 macro.phraseSeq

macro.phraseSeq (phrase sequence) defines a sequence of character data and phrase-level elements.
Module	tei
Used by	fw
Content model	<content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.qLike"/> <classRef key="model.phrase"/> <classRef key="model.global"/> </alternate> </content>
Declaration	macro.phraseSeq = ( text \| model.gLike \| model.qLike \| model.phrase \| model.global )*

Appendix A.5 Datatypes

Appendix A.5.1 teidata.certainty

teidata.certainty defines the range of attribute values expressing a degree of certainty.
Module	tei
Used by	teidata.probCert
Content model	<content> <valList type="closed"> <valItem ident="high"/> <valItem ident="medium"/> <valItem ident="low"/> <valItem ident="unknown"/> </valList> </content>
Declaration	teidata.certainty = "high" \| "medium" \| "low" \| "unknown"
Note	Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter.

Appendix A.5.2 teidata.enumerated

teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities.
Module	tei
Used by	Element: p/@part
Content model	<content> <dataRef key="teidata.word"/> </content>
Declaration	teidata.enumerated = teidata.word
Note	Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element.

Appendix A.5.3 teidata.language

teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system.
Module	tei
Used by
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="language"/> <valList> <valItem ident=""/> </valList> </alternate> </content>
Declaration	teidata.language = xsd:language \| ( "" )
Note	The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice. A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable. language The IANA-registered code for the language. This is almost always the same as the ISO 639 2-letter language code if there is one. The list of available registered language subtags can be found at http://www.iana.org/assignments/language-subtag-registry. It is recommended that this code be written in lower case. script The ISO 15924 code for the script. These codes consist of 4 letters, and it is recommended they be written with an initial capital, the other three letters in lower case. The canonical list of codes is maintained by the Unicode Consortium, and is available at http://unicode.org/iso15924/iso15924-codes.html. The IETF recommends this code be omitted unless it is necessary to make a distinction you need. region Either an ISO 3166 country code or a UN M.49 region code that is registered with IANA (not all such codes are registered, e.g. UN codes for economic groupings or codes for countries for which there is already an ISO 3166 2-letter code are not registered). The former consist of 2 letters, and it is recommended they be written in upper case; the list of codes can be searched or browsed at https://www.iso.org/obp/ui/#search/code/. The latter consist of 3 digits; the list of codes can be found at http://unstats.un.org/unsd/methods/m49/m49.htm. variant An IANA-registered variation. These codes are used to indicate additional, well-recognized variations that define a language or its dialects that are not covered by other available subtags. extension An extension has the format of a single letter followed by a hyphen followed by additional subtags. These exist to allow for future extension to BCP 47, but as of this writing no such extensions are in use. private use An extension that uses the initial subtag of the single letter x (i.e., starts with `x-`) has no meaning except as negotiated among the parties involved. These should be used with great care, since they interfere with the interoperability that use of RFC 4646 is intended to promote. In order for a document that makes use of these subtags to be TEI-conformant, a corresponding <language> element must be present in the TEI header. There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications. Second, an entire language tag can consist of only a private use subtag. These tags start with `x-`, and do not need to follow any further rules established by the IETF and endorsed by these Guidelines. Like all language tags that make use of private use subtags, the language in question must be documented in a corresponding <language> element in the TEI header. Examples include sn Shona zh-TW Taiwanese zh-Hant-HK Chinese written in traditional script as used in Hong Kong en-SL English as spoken in Sierra Leone pl Polish es-MX Spanish as spoken in Mexico es-419 Spanish as spoken in Latin America The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML.

Appendix A.5.4 teidata.name

teidata.name defines the range of attribute values expressed as an XML Name.
Module	tei
Used by
Content model	<content> <dataRef name="Name"/> </content>
Declaration	teidata.name = xsd:Name
Note	Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see http://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits.

Appendix A.5.5 teidata.numeric

teidata.numeric defines the range of attribute values used for numeric values.
Module	tei
Used by
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="double"/> <dataRef name="token" restriction="(\-?[\d]+/\-?[\d]+)"/> <dataRef name="decimal"/> </alternate> </content>
Declaration	teidata.numeric = xsd:double \| token { pattern = "(\-?[\d]+/\-?[\d]+)" } \| xsd:decimal
Note	Any numeric value, represented as a decimal number, in floating point format, or as a ratio. To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3. A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2.

Appendix A.5.6 teidata.pointer

teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere.
Module	tei
Used by
Content model	<content> <dataRef name="anyURI"/> </content>
Declaration	teidata.pointer = xsd:anyURI
Note	The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, `https://secure.wikimedia.org/wikipedia/en/wiki/%` is encoded as `https://secure.wikimedia.org/wikipedia/en/wiki/%25` while `http://موقع.وزارة-الاتصالات.مصر/` is encoded as `http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/`

Appendix A.5.7 teidata.probCert

teidata.probCert defines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value.
Module	tei
Used by
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <dataRef key="teidata.probability"/> <dataRef key="teidata.certainty"/> </alternate> </content>
Declaration	teidata.probCert = teidata.probability \| teidata.certainty

Appendix A.5.8 teidata.probability

teidata.probability defines the range of attribute values expressing a probability.
Module	tei
Used by	teidata.probCert
Content model	<content> <dataRef name="double"/> </content>
Declaration	teidata.probability = xsd:double
Note	Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true.

Appendix A.5.9 teidata.temporal.w3c

teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification.
Module	tei
Used by
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="date"/> <dataRef name="gYear"/> <dataRef name="gMonth"/> <dataRef name="gDay"/> <dataRef name="gYearMonth"/> <dataRef name="gMonthDay"/> <dataRef name="time"/> <dataRef name="dateTime"/> </alternate> </content>
Declaration	teidata.temporal.w3c = xsd:date \| xsd:gYear \| xsd:gMonth \| xsd:gDay \| xsd:gYearMonth \| xsd:gMonthDay \| xsd:time \| xsd:dateTime
Note	If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.

Appendix A.5.10 teidata.text

teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace.
Module	tei
Used by
Content model	<content> <dataRef name="string"/> </content>
Declaration	teidata.text = string
Note	Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted.

Appendix A.5.11 teidata.truthValue

teidata.truthValue defines the range of attribute values used to express a truth value.
Module	tei
Used by
Content model	<content> <dataRef name="boolean"/> </content>
Declaration	teidata.truthValue = xsd:boolean
Note	The possible values of this datatype are 1 or true, or 0 or false. This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: data.xTruthValue.

Appendix A.5.12 teidata.word

teidata.word defines the range of attribute values expressed as a single word or token.
Module	tei
Used by	teidata.enumerated
Content model	<content> <dataRef name="token" restriction="[^\p{C}\p{Z}]+"/> </content>
Declaration	teidata.word = token { pattern = "[^\p{C}\p{Z}]+" }
Note	Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Appendix A.5.13 teidata.xTruthValue

teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown.
Module	tei
Used by
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="boolean"/> <valList> <valItem ident="unknown"/> <valItem ident="inapplicable"/> </valList> </alternate> </content>
Declaration	teidata.xTruthValue = xsd:boolean \| ( "unknown" \| "inapplicable" )
Note	In cases where where uncertainty is inappropriate, use the datatype data.TruthValue.

Encoding guidelines for e-ditiones - level 1

Table of contents

1. Introduction

1.1. The corpus

1.2. General principles

1.3. Document identifiers

2. The encoding

2.1. Headers encoding

2.1.1. The <fileDesc>

2.1.1.1. The <titleStmt>

2.1.1.2. The <editionStmt>

2.1.1.3. The <extent>

2.1.1.4. The <publicationStmt>

2.1.1.5. The <sourceDesc>

2.1.1.5.1. The <msDesc>

2.1.2. The <encodingDesc>

2.1.3. The <revisionDesc>

2.2. Text encoding

2.2.1. First level

2.2.2. Second level

2.2.2.1. Common elements

2.2.2.2. Specific elements

2.2.2.2.1. Drama

2.2.2.2.2. Letters

2.2.3. Third level

2.3. The attributes

2.3.1. xml:id

2.3.2. n

2.3.3. type

2.3.4. Recap table for attributes

Appendix A Encoding specifications

Appendix A.1 Elements

Appendix A.1.1 <body>

Appendix A.1.2 <choice>

Appendix A.1.3 <corr>

Appendix A.1.4 <fw>

Appendix A.1.5 <hi>

Appendix A.1.6 <lb>

Appendix A.1.7 <p>

Appendix A.1.8 <pb>

Appendix A.1.9 <sic>

Appendix A.1.10 <text>

Appendix A.2 Model classes

Appendix A.2.1 model.choicePart

Appendix A.2.2 model.common

Appendix A.2.3 model.divBottom

Appendix A.2.4 model.divPart

Appendix A.2.5 model.divTop

Appendix A.2.6 model.divTopPart

Appendix A.2.7 model.global

Appendix A.2.8 model.hiLike

Appendix A.2.9 model.highlighted

Appendix A.2.10 model.inter

Appendix A.2.11 model.limitedPhrase

Appendix A.2.12 model.milestoneLike

Appendix A.2.13 model.nameLike

Appendix A.2.14 model.pLike

Appendix A.2.15 model.pPart.data

Appendix A.2.16 model.pPart.edit

Appendix A.2.17 model.pPart.editorial

Appendix A.2.18 model.pPart.transcriptional

Appendix A.2.19 model.phrase

Appendix A.2.20 model.placeStateLike

Appendix A.2.21 model.qLike

Appendix A.3 Attribute classes

Appendix A.3.1 att.breaking

Appendix A.3.2 att.declaring

Appendix A.3.3 att.editLike

Appendix A.3.4 att.edition

Appendix A.3.5 att.global.change

Appendix A.3.6 att.global.facs

Appendix A.3.7 att.global.rendition

Appendix A.3.8 att.global.responsibility

Appendix A.3.9 att.global.source

Appendix A.3.10 att.placement

Appendix A.3.11 att.spanning

Appendix A.3.12 att.written

Appendix A.4 Macros

Appendix A.4.1 macro.paraContent

Appendix A.4.2 macro.phraseSeq