Encoding guidelines for e-ditiones - level 1

Table of contents

1. Introduction

This document provides the specific encoding principles for manuscripts and printed documents of the project e-ditiones.

1.1. The corpus

The project e-ditiones aims to encode severals 17th century French manuscripts and printed documents and, later, to present them on a digital library. We chose various literary genres, like drama, letters or novels.

1.2. General principles

Considering the fact that we have two major types of texts, as a reminder, printed documents and manuscripts, we made the decision to separate metadata from text. This way, we have the possibility to create two schemas, one specific to printed documents, the other to manuscripts.

Please note that we are still working on the best way to form a complete file.

Another essential principle of the project is to use a minimal set of elements.

Representation of the use of ODD-chaining in the e-ditiones project
Figure 1. ODD-chaining

1.3. Document identifiers

One of our priorities is to clearly identify the text once encoded. We chose to give to each document an unique identifier consisting of the first three letters of the project, an underscore and a four digit serial number. So, the identifier of the first text encoded will be EDI_0001.

For each other subdivision, such as chapter, act, speech, paragraph or line, you just have to concatenat the idenfier of the upper subdivision with a dash and a new number. For example, if the text encoded is a play, the identifier EDI_0001-1-3-4-5 can be understood as the fifth speech of the fourth scene, third act of the first play in the document with the identifier EDI_0001.

If there is a front, such as a cast list, we decided to add a O between the identifier of the upper subdivision and the new number. For example EDI_0001-0-1 indicates that this part is the first subdivision of the front from the document EDI_0001. In this way, we can immediately know the position of a subvision in the text.

It might appear a little bit complicated but this method makes sure that every single line or paragraph can be clearly identify.

2. The encoding

2.1. Headers encoding

First, we chose to identify two differents types of headers :

Both headers are mostly the same : they contain a <fileDesc>, a <encodingDesc> and a <revisionDesc>. The only difference between the two of them is the addition of the <msDesc> used for the description of a manuscript.

<teiHeader>  <fileDesc> <!-- informations about the encoded document -->  </fileDesc>  <encodingDesc> <!-- informations about the relationship between the encoded text and its source -->  </encodingDesc>  <revisionDesc> <!-- revision informations -->  </revisionDesc> </teiHeader>

2.1.1. The <fileDesc>

This part of the header contains at least five other parts :

  • the <titleStmt>
  • the <editionStmt>
  • the <extent>
  • the <publicationStmt>
  • the <sourceDesc>
2.1.1.1. The <titleStmt>

This part is essential for the presentation of the encoded document. It has to contain at least one <title> and one <author>.

<titleStmt>  <title>Bérénice</title>  <author>Jean Racine</author> </titleStmt>
2.1.1.2. The <editionStmt>

This part contains the name of the editor and the date of encoding.

<editionStmt>  <edition>   <date>30/04/2020</date>  </edition>  <respStmt>   <persName>Simon Gabay</persName>   <resp>Éditeur scientifique</resp>  </respStmt> </editionStmt>
2.1.1.3. The <extent>

This part indicates the size of the work and contains the number of words and pages (considering that the number of pages equals the number of <pb>.

<extent>  <measure unit="words">12543</measure>  <measure unit="pages">99</measure> </extent>
2.1.1.4. The <publicationStmt>

This part contains an <authority> element with the name of the project and an <availability> element with its status and the <licence>.

<publicationStmt>  <authority>e-ditiones</authority>  <availability status="restricted">   <licence target="https://creativecommons.org/licenses/by/4.0/">Attribution      4.0 International (CC BY 4.0)</licence>  </availability> </publicationStmt>
2.1.1.5. The <sourceDesc>

This part contains one (or more) bibliographical description wich includes standards TEI elements such as <author>, <title> or <date>.

<sourceDesc>  <bibl>   <author>Jean Racine</author>   <title>Oeuvres</title>   <publisher>Jean Ribou</publisher>   <pubPlace>Paris</pubPlace>   <date when="1676">1676</date>   <ptr target="https://gallica.bnf.fr/ark:/12148/bpt6k990581p"/>  </bibl> </sourceDesc>
2.1.1.5.1. The <msDesc>

As already said, there is a particularity when the text encoded is a manuscript. To describe the document, we have to use the <msDesc> element. To ensure a good encoding, severals elements are recommanded :

  • <msIdentifier> which contains informations used to properly identify the manuscript

    <msIdentifier>  <country>Etats-Unis</country>  <settlement>Princeton</settlement>  <institution>Princeton University Library</institution>  <repository>Manuscripts Division, Department of Rare Books and    Special Collections</repository>  <collection>John Hinsdale Scheide Collection of Three Centuries of    French History</collection>  <idno type="shelfmark">C0710, vol. 3</idno> </msIdentifier>
  • <msContent> which contains informations about the intellectual content of the manuscript
  • <physDesc>which contains informations about the physical description of the document such as the <objectDesc> or the <bindingDesc>

    <physDesc>  <objectDesc>   <supportDesc>    <support>     <objectType rend="composite">composite          repository</objectType>     <material>papier.</material>    </support>    <extent>     <measure unit="pagen="unk"/>    </extent>    <foliation>Pages aren't numebered</foliation>   </supportDesc>  </objectDesc>  <bindingDesc>   <binding>    <p/>   </binding>  </bindingDesc> </physDesc>
  • <history> which contains informations about the history of the manuscript
  • <additional>which contains more informations about the document, such as <surrogates> or bibliographical informations (<bibl>)

    <additional>  <surrogates> <!-- add images here -->   <graphic source="localurl="chemin"/>  </surrogates>  <listBibl>   <listBibl type="L1">    <bibl>La Fayette, <title>Œuvres complètes</title>, C.        Esmein-Sarrazin (éd.), Paris: Gallimard, lettre        n°70-1.</bibl>   </listBibl>  </listBibl> </additional>
  • <msPart> which might contain the preceding elements is used to the description of a specific part of the encoded manuscript

2.1.2. The <encodingDesc>

This part describes the relationship between the encoded text and its source. It might contain :

  • <projectDesc> which describes the project
  • <editorialDecl> which contains informations about the editorial principles such as <correction>, <normalization> or <interpretation>
<encodingDesc>  <projectDesc>   <p>Creation of a NLP tools for 17th French</p>  </projectDesc>  <editorialDecl>   <correction>    <p>Very minor corrections, usually tagged.</p>   </correction>   <hyphenation>    <p>Kept, encoded with <gi>c</gi>    </p>   </hyphenation>   <normalization>    <p>None</p>   </normalization>   <quotation>    <p>Original</p>   </quotation>   <punctuation>    <p>Original</p>   </punctuation>   <interpretation>    <p>None</p>   </interpretation>   <segmentation>    <p>Text is divided in <list>      <item>sentences encoded with <gi>s</gi>      </item>      <item>sub-sentences encoded with <gi>seg</gi> (most or the time based            on columns and semicolons)</item>     </list> and </p>   </segmentation>  </editorialDecl> </encodingDesc>

2.1.3. The <revisionDesc>

This last part of the header contains informations about at least one <change> during the production of the document. when is used to specify the date of the event.

<revisionDesc>  <change when="20200430">Add documentation</change> </revisionDesc>

2.2. Text encoding

After the OCR of the text, its encoding will be completed in three phases :

Please note that at each level, all existing elements are still used and new elements are added to the existing ones.

2.2.1. First level

The purpose of the first level is to distinguish between form and content. To do that, we chose to only use a few elements. First, at all levels our edition must contain a <text> element with the following namespace : @xmlns="http://www.tei-c.org/ns/1.0". It checks the validation of the TEI schema.

Then, at this level of encoding, all the text is included in the <body> and in a single <p>. Some informations are added at this point : concerning the content of the text, the element <fw> contains informations such as title, pagination or editor's notes. The other informations added are about the form of the text. We decided to employ the elements <pb> and <lb>. The first one, <pb>, which marks the point where a new page begins, is useful in the way that it can be used to check the transcription but also to compare our edition with a reference edition. The second one, <lb>, which marks the point where a new line begins, provides graphical informations and can be used for an automatic encoding process. It has two required attributes : break and rend. If a word is cut at the end of a line, break with the value "no" is useful in that the complete word can be establish again and be considere as a token. @rend shows which mark is used (a dash or an hyphen for example).

<text xml:id="EDI_0001">  <body>   <lb/>   <fw>A MONSEIGNEVR LE DVC D’ESPERNON. <lb/>Lettre I.</fw>   <p>    <lb/>Monseignevr, Quand ie ne ſerois pas nay cõme ie ſuis, voſtre   <lb/>tres-humble ſeruiteur, il faudroit que ie fuſſe mauuais <lb/>François      pour ne me reſioüir pas des contẽtemens de voſtre maiſon, <lb/>puis que ce      ſont des felicités publiques.</p>  </body> </text>

2.2.2. Second level

At this level of encoding, we add manually some semantic informations. Considering that we want to use, as mentioned before, a minimal set of elements, we decided to only employ common elements. Despite this, in the case of texts such as plays or letters, the use of a few specific elements is recommended.

2.2.2.1. Common elements

It is possible to use the following elements :

ElementText typeNote
<front>any prefatory matter
<div>any text subdivisiontype,n and xml:id are required
<back>any type of appendix
<head>any type of headingthis can be used to clarify <fw>
<list>and<item>any type of listn and xml:id are required
<orgName>, <persName> and <placeName>any type of person, place or organisationthis can be useful for entity search
<l> and <lg>any type of line or line group
<note>any type of noteit can be used for a note by the autor, the editor or, rarely, added during the encoding
2.2.2.2. Specific elements

There are only two exceptions, drama and letters.

2.2.2.2.1. Drama

If the text encoded is a play, it is allowed to use three new elements :

ElementText typeNote
<sp>contains a speechn and xml:id are required
<stage>any stage directione.g. useful to study spoken words
<speaker>any speaker in a speech
<text xml:id="EDI_0001">  <body>   <div type="letterxml:id="EDI_0001-1"    n="1">    <head>A <persName>MONSEIGNEVR LE DVC D’ESPERNON</persName>.    <lb/>Lettre I.</head>    <p n="1xml:id="EDI_0001-1-1">     <persName>Monseignevr</persName>, Quand ie ne ſerois pas nay cõme        ie ſuis, voſtre tres-humble ſeruiteur, il faudroit que ie fuſſe        mauuais François pour ne me reſioüir pas des contẽtemens de voſtre    <orgName>maiſon</orgName>, puis que ce ſont des felicités        publiques. <lb/>Nous auõs ſçeu l’heureux ſuccés du voyage que vous        auez fait en <placeName>Bearn</placeName>    </p>   </div>  </body> </text>

Example of a letter


<body>  <div type="playxml:id="EDI_0002-1n="1">   <head>    <lb/>L’ILLVSION <lb/>COMIQVE <lb/>COMEDIE</head>   <div type="actxml:id="EDI_0002-1-1"    n="1">    <lb/>    <head>ACTE PREMIER.</head>    <div type="scene"     xml:id="EDI_0002-1-1-1n="1">     <lb/>     <head>SCENE PREMIERE.</head>     <lb/>     <stage>      <persName>PRIDAMANT</persName>,     <persName>DORANTE</persName>.</stage>     <lb/>     <sp n="1xml:id="EDI_0002-1-1-1-1">      <speaker>DORANTE.</speaker>      <p n="1xml:id="EDI_0002-1-1-1-1-1">       <lb/>CE grand Mage dont l'art commande <lb/>à la nature      <lb/>N'a choiſi pour palais que cette grotte <lb/>obſcure;      <lb/>La nuit qu'il entretient ſur cet af <lb break="norend="¬"/>freux ſeiour <lb/>N'ouurant ſon voile espais            qu'aux raions d’vn <lb/>fauxiour, <fw>        <lb/>A <lb/>2 L’ILLVSION COMIQ.</fw>       <lb/>De leur eſclat douteux n'admet en ces lieux ſombres      <lb/>Que ce qu'en peut ſouffrir le commerce des ombres.      <lb/>N'auances pas, ſon art au pied de ce Rocher <lb/>A mis            dequoy punir qui s'en oſe approcher, <lb/>Et cette large            boucbe eſt vn mur inuiſible, <lb/>Ou l'air en ſa faueur            deuient inacceßible, <lb/>Et luy fait vn rampart dont les            funestes bords <lb/>Sur vn peu de poußiere eſtalent mille            morts. <lb/>Ialoux de ſon repos plus que de ſa deffenſe      <lb/>Il perd qui l'importune ainſi que qui l'offence, <lb/>Si            bien que ceux qu'amene vn curieux deſir <lb/>Pour conſulter      <persName>Alcandre</persName> attendent ſon loiſir,      <lb/>Chaque iour il ſe monſtre, &amp; nous touchons à l'heure      <lb/>Que pour ſe diuertir il ſort de ſa demeure.</p>     </sp>    </div>   </div>  </div> </body>

Example of a speech

2.2.2.2.2. Letters

If the text encoded is a letter, it is allowed to use two new elements :

ElementText typeNote
<opener>any text at the start of a lettere.g. a salutation or a dateline
<closer>any text at the end of a lettere.g. a salutation or a dateline

2.2.3. Third level

This level of encoding is automaticaly done. In order to add some linguistical informations, the original version of the text is normalized with the following elements : <choice>, <orig> and <reg>. Then, in order to process tokenization and lemmatization on the text, we decided to split it with <seg> and <w>. The first one, <seg> is used to represent any segmentation of the text. Note that sentences and clauses remain our basic units but we recommand to split a long sentence in several segments. The <w> is used to mark a single token. Regarding ponctuation, we decided to consider the marks as tokens; first, because more precision wouldn't be useful for our analyse and second, because with this choice, our encoding remains compatible with ELTeC.

<p n="1xml:id="EDI_0002-1-1-1-1-1">  <choice>   <orig>    <seg>     <w>N</w>     <w>'</w>     <w>a</w>     <w>choiſi</w>     <w>pour</w>     <w>palais</w>     <w>que</w>     <w>cette</w>     <w>grotte</w>    </seg>   </orig>   <reg>    <w>N</w>    <w>'</w>    <w>a</w>    <w>choisi</w>    <w>pour</w>    <w>palais</w>    <w>que</w>    <w>cette</w>    <w>grotte</w>   </reg>  </choice> </p>

2.3. The attributes

We decided to define a closed of attributes that can be used for the encoding. There are only three of them :

Please note that all of them are required.

2.3.1. xml:id

This attribute is used to identify the document or its subdivisions. Earlier in this document, we presented the way to properly generate identifiers.

xml:id is required on several elements and a diffetent levels :

ElementLevel of encoding
<text>all levels
<div>levels 2 and 3
<p>levels 2 and 3
<sp>levels 2 and 3
<lg>levels 2 and 3
<l>levels 2 and 3

2.3.2. n

This attribute is used to identify the numbering of its element from the second level. Node children elements are numbered incrementaly starting with 1.

Note that there are two exceptions :

  • <pb> : numbering starts at the beginning of the edition and continues until its end
  • <l> : numbering (re)starts at the beginning of each page

Note: In this way, it's possible to compare our edition with an reference edition.

ElementsNumbering starts at :
<div>parent node
<sp>parent node
<p>parent node
<lg>parent node
<l>each new page
<pb>beginning of the edition

2.3.3. type

This attribute is used to specify the type of the current <div>.

Note that for this attribute, the use of predefined values is restricted.

ValueUsecase
titlePagein the <front>, used for the title page of the work
privilegein the <front>, used for the privilege of the work
castListin the <front>, used for the cast list
liminalin the <front>, used for any liminal part of the work
playused at the beginning of a new play
actused at the beginning of a new act
sceneused at the beginning of a new scene
partused for any part of the work
subPartused for any subpart (child of a type="part") of the work
letterused for any letter
collectionused for any type of collection

2.3.4. Recap table for attributes

<text><div><lg><l><sp><p><pb>
xml:idrequiredrequiredrequiredrequiredrequiredrequirednot required
nnot requiredrequiredrequiredrequiredrequiredrequiredrequired
type not requiredrequirednot requirednot requirednot requirednot requirednot required

Appendix A Encoding specifications

Appendix A.1 Elements

Appendix A.1.1 <body>

<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure]
Moduletextstructure
AttributesAttributes att.declaring (@decls) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Contained by
textstructure: text
May contain
core: lb p pb
transcr: fw
Example
<body>  <l>Nu scylun hergan hefaenricaes uard</l>  <l>metudæs maecti end his modgidanc</l>  <l>uerc uuldurfadur sue he uundra gihuaes</l>  <l>eci dryctin or astelidæ</l>  <l>he aerist scop aelda barnum</l>  <l>heben til hrofe haleg scepen.</l>  <l>tha middungeard moncynnæs uard</l>  <l>eci dryctin æfter tiadæ</l>  <l>firum foldu frea allmectig</l>  <trailer>primo cantauit Cædmon istud carmen.</trailer> </body>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <classRef key="model.global"
   minOccurs="0" maxOccurs="unbounded"/>
  <sequence minOccurs="0" maxOccurs="1">
   <classRef key="model.divTop"/>
   <alternate minOccurs="0"
    maxOccurs="unbounded">
    <classRef key="model.global"/>
    <classRef key="model.divTop"/>
   </alternate>
  </sequence>
  <sequence minOccurs="0" maxOccurs="1">
   <classRef key="model.divGenLike"/>
   <alternate minOccurs="0"
    maxOccurs="unbounded">
    <classRef key="model.global"/>
    <classRef key="model.divGenLike"/>
   </alternate>
  </sequence>
  <alternate minOccurs="1" maxOccurs="1">
   <sequence minOccurs="1"
    maxOccurs="unbounded">
    <classRef key="model.divLike"/>
    <alternate minOccurs="0"
     maxOccurs="unbounded">
     <classRef key="model.global"/>
     <classRef key="model.divGenLike"/>
    </alternate>
   </sequence>
   <sequence minOccurs="1"
    maxOccurs="unbounded">
    <classRef key="model.div1Like"/>
    <alternate minOccurs="0"
     maxOccurs="unbounded">
     <classRef key="model.global"/>
     <classRef key="model.divGenLike"/>
    </alternate>
   </sequence>
   <sequence minOccurs="1" maxOccurs="1">
    <sequence minOccurs="1"
     maxOccurs="unbounded">
     <classRef key="model.common"/>
     <classRef key="model.global"
      minOccurs="0" maxOccurs="unbounded"/>
    </sequence>
    <alternate minOccurs="0" maxOccurs="1">
     <sequence minOccurs="1"
      maxOccurs="unbounded">
      <classRef key="model.divLike"/>
      <alternate minOccurs="0"
       maxOccurs="unbounded">
       <classRef key="model.global"/>
       <classRef key="model.divGenLike"/>
      </alternate>
     </sequence>
     <sequence minOccurs="1"
      maxOccurs="unbounded">
      <classRef key="model.div1Like"/>
      <alternate minOccurs="0"
       maxOccurs="unbounded">
       <classRef key="model.global"/>
       <classRef key="model.divGenLike"/>
      </alternate>
     </sequence>
    </alternate>
   </sequence>
  </alternate>
  <sequence minOccurs="0"
   maxOccurs="unbounded">
   <classRef key="model.divBottom"/>
   <classRef key="model.global"
    minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
 </sequence>
</content>
    
Schema Declaration
element body
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.declaring.attributes,
   (
      model.global*,
      ( model.divTop, ( model.global | model.divTop )* )?,
      ( model.divGenLike, ( model.global | model.divGenLike )* )?,
      (
         ( model.divLike, ( model.global | model.divGenLike )* )+
       | ( model.div1Like, ( model.global | model.divGenLike )* )+
       | (
            ( model.common, model.global* )+,
            (
               ( model.divLike, ( model.global | model.divGenLike )* )+
             | ( model.div1Like, ( model.global | model.divGenLike )* )+
            )?
         )
      ),
      ( model.divBottom, model.global* )*
   )
}

Appendix A.1.2 <choice>

<choice> groups a number of alternative encodings for the same point in a text. [3.4. Simple Editorial Changes]
Modulecore
AttributesAttributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of
Contained by
transcr: fw
May contain
Note

Because the children of a <choice> element all represent alternative ways of encoding the same sequence, it is natural to think of them as mutually exclusive. However, there may be cases where a full representation of a text requires the alternative encodings to be considered as parallel.

Note also that <choice> elements may self-nest.

Where the purpose of an encoding is to record multiple witnesses of a single work, rather than to identify multiple possible encoding decisions at a given point, the <app> element and associated elements discussed in section 12.1. The Apparatus Entry, Readings, and Witnesses should be preferred.

ExampleAn American encoding of Gulliver's Travels which retains the British spelling but also provides a version regularized to American spelling might be encoded as follows.
<p>Lastly, That, upon his solemn oath to observe all the above articles, the said man-mountain shall have a daily allowance of meat and drink sufficient for the support of <choice>   <sic>1724</sic>   <corr>1728</corr>  </choice> of our subjects, with free access to our royal person, and other marks of our <choice>   <orig>favour</orig>   <reg>favor</reg>  </choice>.</p>
Content model
<content>
 <alternate minOccurs="2"
  maxOccurs="unbounded">
  <classRef key="model.choicePart"/>
  <elementRef key="choice"/>
 </alternate>
</content>
    
Schema Declaration
element choice
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   ( model.choicePart | choice )+
}

Appendix A.1.3 <corr>

<corr> (correction) contains the correct form of a passage apparently erroneous in the copy text. [3.4.1. Apparent Errors]
Modulecore
AttributesAttributes att.editLike (@instant) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of
Contained by
transcr: fw
May contain
transcr: fw
character data
ExampleIf all that is desired is to call attention to the fact that the copy text has been corrected, <corr> may be used alone:
I don't know, Juan. It's so far in the past now — how <corr>can we</corr> prove or disprove anyone's theories?
ExampleIt is also possible, using the <choice> and <sic> elements, to provide an uncorrected reading:
I don't know, Juan. It's so far in the past now — how <choice>  <sic>we can</sic>  <corr>can we</corr> </choice> prove or disprove anyone's theories?
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element corr
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.editLike.attributes,
   macro.paraContent
}

Appendix A.1.4 <fw>

<fw> (forme work) contains a running head (e.g. a header, footer), catchword, or similar material appearing on the current page. [11.6. Headers, Footers, and Similar Matter]
Moduletranscr
AttributesAttributes att.placement (@place) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of
Contained by
core: corr hi p sic
textstructure: body text
transcr: fw
May contain
transcr: fw
character data
Note

Where running heads are consistent throughout a chapter or section, it is usually more convenient to relate them to the chapter or section, e.g. by use of the rend attribute. The <fw> element is intended for cases where the running head changes from page to page, or where details of page layout and the internal structure of the running heads are of paramount importance.

Example
<fw type="sigplace="bottom">C3</fw>
Content model
<content>
 <macroRef key="macro.phraseSeq"/>
</content>
    
Schema Declaration
element fw
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.placement.attributes,
   att.written.attributes,
   macro.phraseSeq
}

Appendix A.1.5 <hi>

<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language]
Modulecore
AttributesAttributes att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of
Contained by
core: corr hi p sic
transcr: fw
May contain
transcr: fw
character data
Example
<hi rend="gothic">And this Indenture further witnesseth</hi> that the said <hi rend="italic">Walter Shandy</hi>, merchant, in consideration of the said intended marriage ...
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element hi
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.written.attributes,
   macro.paraContent
}

Appendix A.1.6 <lb>

<lb> (line beginning) marks the beginning of a new (typographic) line in some edition or version of a text. [3.10.3. Milestone Elements 7.2.5. Speech Contents]
Modulecore
AttributesAttributes att.edition (@ed, @edRef) att.spanning (@spanTo) att.breaking (@break) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of
Contained by
core: corr hi p sic
textstructure: body text
transcr: fw
May containEmpty element
Note

By convention, <lb> elements should appear at the point in the text where a new line starts. The n attribute, if used, indicates the number or other value associated with the text between this point and the next <lb> element, typically the sequence number of the line within the page, or other appropriate unit. This element is intended to be used for marking actual line breaks on a manuscript or printed page, at the point where they occur; it should not be used to tag structural units such as lines of verse (for which the <l> element is available) except in circumstances where structural units cannot otherwise be marked.

The type attribute may be used to characterize the line break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the line break is word-breaking, or to note the source from which it derives.

ExampleThis example shows typographical line breaks within metrical lines, where they occur at different places in different editions:
<l>Of Mans First Disobedience,<lb ed="1674"/> and<lb ed="1667"/> the Fruit</l> <l>Of that Forbidden Tree, whose<lb ed="1667 1674"/> mortal tast</l> <l>Brought Death into the World,<lb ed="1667"/> and all<lb ed="1674"/> our woe,</l>
ExampleThis example encodes typographical line breaks as a means of preserving the visual appearance of a title page. The break attribute is used to show that the line break does not (as elsewhere) mark the start of a new word.
<titlePart>  <lb/>With Additions, ne-<lb break="no"/>ver before Printed. </titlePart>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element lb
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.edition.attributes,
   att.spanning.attributes,
   att.breaking.attributes,
   empty
}

Appendix A.1.7 <p>

<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents]
Modulecore
AttributesAttributes att.declaring (@decls) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
partspecifies whether or not its parent element is fragmented in some way, typically by some other overlapping structure: for example a speech which is divided between two or more verse stanzas, a paragraph which is split across a page division, a verse line which is divided between two speakers.
Derived fromatt.fragmentable
StatusOptional
Datatypeteidata.enumerated
Legal values are:
N
[Default]
Member of
Contained by
textstructure: body
May contain
transcr: fw
character data
Example
<p>Hallgerd was outside. <q>There is blood on your axe,</q> she said. <q>What have you    done?</q> </p> <p>  <q>I have now arranged that you can be married a second time,</q> replied Thjostolf. </p> <p>  <q>Then you must mean that Thorvald is dead,</q> she said. </p> <p>  <q>Yes,</q> said Thjostolf. <q>And now you must think up some plan for me.</q> </p>
Schematron
<s:report test="not(ancestor::tei:floatingText) and (ancestor::tei:p or ancestor::tei:ab) and not(parent::tei:exemplum |parent::tei:item |parent::tei:note |parent::tei:q |parent::tei:quote |parent::tei:remarks |parent::tei:said |parent::tei:sp |parent::tei:stage |parent::tei:cell |parent::tei:figure )"> Abstract model violation: Paragraphs may not occur inside other paragraphs or ab elements. </s:report>
Schematron
<s:report test="ancestor::tei:l[not(.//tei:note//tei:p[. = current()])]"> Abstract model violation: Lines may not contain higher-level structural elements such as div, p, or ab. </s:report>
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element p
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.declaring.attributes,
   att.written.attributes,
   attribute part { "N" }?,
   macro.paraContent
}

Appendix A.1.8 <pb>

<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.10.3. Milestone Elements]
Modulecore
AttributesAttributes att.edition (@ed, @edRef) att.spanning (@spanTo) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.facs (@facs) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of
Contained by
core: corr hi p sic
textstructure: body text
transcr: fw
May containEmpty element
Note

A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself.

The type attribute may be used to characterize the page break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the page break is word-breaking, or to note the source from which it derives.

ExamplePage numbers may vary in different editions of a text.
<p> ... <pb n="145ed="ed2"/> <!-- Page 145 in edition "ed2" starts here --> ... <pb n="283ed="ed1"/> <!-- Page 283 in edition "ed1" starts here--> ... </p>
ExampleA page break may be associated with a facsimile image of the page it introduces by means of the facs attribute
<body>  <pb n="1facs="page1.png"/> <!-- page1.png contains an image of the page; the text it contains is encoded here -->  <p> <!-- ... -->  </p>  <pb n="2facs="page2.png"/> <!-- similarly, for page 2 -->  <p> <!-- ... -->  </p> </body>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element pb
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.facs.attribute.facs,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.edition.attributes,
   att.spanning.attributes,
   empty
}

Appendix A.1.9 <sic>

<sic> (Latin for thus or so) contains text reproduced although apparently incorrect or inaccurate. [3.4.1. Apparent Errors]
Modulecore
AttributesAttributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Member of
Contained by
transcr: fw
May contain
transcr: fw
character data
Example
for his nose was as sharp as a pen, and <sic>a Table</sic> of green fields.
ExampleIf all that is desired is to call attention to the apparent problem in the copy text, <sic> may be used alone:
I don't know, Juan. It's so far in the past now — how <sic>we can</sic> prove or disprove anyone's theories?
ExampleIt is also possible, using the <choice> and <corr> elements, to provide a corrected reading:
I don't know, Juan. It's so far in the past now — how <choice>  <sic>we can</sic>  <corr>can we</corr> </choice> prove or disprove anyone's theories?
Example
for his nose was as sharp as a pen, and <choice>  <sic>a Table</sic>  <corr>a' babbld</corr> </choice> of green fields.
Content model
<content>
 <macroRef key="macro.paraContent"/>
</content>
    
Schema Declaration
element sic
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   macro.paraContent
}

Appendix A.1.10 <text>

<text> contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text]
Moduletextstructure
AttributesAttributes att.declaring (@decls) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
Contained by
May contain
core: lb pb
textstructure: body
transcr: fw
Note

This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose.

Example
<text>  <front>   <docTitle>    <titlePart>Autumn Haze</titlePart>   </docTitle>  </front>  <body>   <l>Is it a dragonfly or a maple leaf</l>   <l>That settles softly down upon the water?</l>  </body> </text>
ExampleThe body of a text may be replaced by a group of nested texts, as in the following schematic:
<text>  <front> <!-- front matter for the whole group -->  </front>  <group>   <text> <!-- first text -->   </text>   <text> <!-- second text -->   </text>  </group> </text>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <classRef key="model.global"
   minOccurs="0" maxOccurs="unbounded"/>
  <sequence minOccurs="0" maxOccurs="1">
   <elementRef key="front"/>
   <classRef key="model.global"
    minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
  <alternate minOccurs="1" maxOccurs="1">
   <elementRef key="body"/>
   <elementRef key="group"/>
  </alternate>
  <classRef key="model.global"
   minOccurs="0" maxOccurs="unbounded"/>
  <sequence minOccurs="0" maxOccurs="1">
   <elementRef key="back"/>
   <classRef key="model.global"
    minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
 </sequence>
</content>
    
Schema Declaration
element text
{
   att.global.attribute.xmlid,
   att.global.attribute.n,
   att.global.attribute.xmllang,
   att.global.attribute.xmlbase,
   att.global.attribute.xmlspace,
   att.global.rendition.attribute.rend,
   att.global.rendition.attribute.style,
   att.global.rendition.attribute.rendition,
   att.global.change.attribute.change,
   att.global.responsibility.attribute.cert,
   att.global.responsibility.attribute.resp,
   att.declaring.attributes,
   att.written.attributes,
   (
      model.global*,
      ( front, model.global* )?,
      ( body | group ),
      model.global*,
      ( back, model.global* )?
   )
}

Appendix A.2 Model classes

Appendix A.2.1 model.choicePart

model.choicePart groups elements (other than <choice> itself) which can be used within a <choice> alternation.
Moduletei
Used by
Memberscorr sic

Appendix A.2.2 model.common

model.common groups common chunk- and inter-level elements.
Moduletei
Used by
Membersmodel.divPart[model.lLike model.pLike[p]] model.inter[model.biblLike model.egLike model.labelLike model.listLike model.oddDecl model.qLike[model.quoteLike] model.stageLike]
Note

This class defines the set of chunk- and inter-level elements; it is used in many content models, including those for textual divisions.

Appendix A.2.3 model.divBottom

model.divBottom groups elements appearing at the end of a text division.
Moduletei
Used by
Membersmodel.divBottomPart model.divWrapper

Appendix A.2.4 model.divPart

model.divPart groups paragraph-level elements appearing directly within divisions.
Moduletei
Used by
Membersmodel.lLike model.pLike[p]
Note

Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items.

Appendix A.2.5 model.divTop

model.divTop groups elements appearing at the beginning of a text division.
Moduletei
Used by
Membersmodel.divTopPart[model.headLike] model.divWrapper

Appendix A.2.6 model.divTopPart

model.divTopPart groups elements which can occur only at the beginning of a text division.
Moduletei
Used by
Membersmodel.headLike

Appendix A.2.7 model.global

model.global groups elements which may appear at any point within a TEI text.
Moduletei
Used by
Membersmodel.global.edit model.global.meta model.milestoneLike[fw lb pb] model.noteLike

Appendix A.2.8 model.hiLike

model.hiLike groups phrase-level elements which are typographically distinct but to which no specific function can be attributed.
Moduletei
Used by
Membershi

Appendix A.2.9 model.highlighted

model.highlighted groups phrase-level elements which are typographically distinct.
Moduletei
Used by
Membersmodel.emphLike model.hiLike[hi]

Appendix A.2.10 model.inter

model.inter groups elements which can appear either within or between paragraph-like elements.
Moduletei
Used by
Membersmodel.biblLike model.egLike model.labelLike model.listLike model.oddDecl model.qLike[model.quoteLike] model.stageLike

Appendix A.2.11 model.limitedPhrase

model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources.
Moduletei
Used by
Membersmodel.emphLike model.hiLike[hi] model.pPart.data[model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]] model.pPart.editorial[choice] model.pPart.msdesc model.phrase.xml model.ptrLike

Appendix A.2.12 model.milestoneLike

model.milestoneLike groups milestone-style elements used to represent reference systems.
Moduletei
Used by
Membersfw lb pb

Appendix A.2.13 model.nameLike

model.nameLike groups elements which name or refer to a person, place, or organization.
Moduletei
Used by
Membersmodel.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]
Note

A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc.

Appendix A.2.14 model.pLike

model.pLike groups paragraph-like elements.
Moduletei
Used by
Membersp

Appendix A.2.15 model.pPart.data

model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data.
Moduletei
Used by
Membersmodel.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]

Appendix A.2.16 model.pPart.edit

model.pPart.edit groups phrase-level elements for simple editorial correction and transcription.
Moduletei
Used by
Membersmodel.pPart.editorial[choice] model.pPart.transcriptional[corr sic]

Appendix A.2.17 model.pPart.editorial

model.pPart.editorial groups phrase-level elements for simple editorial interventions that may be useful both in transcribing and in authoring.
Moduletei
Used by
Memberschoice

Appendix A.2.18 model.pPart.transcriptional

model.pPart.transcriptional groups phrase-level elements used for editorial transcription of pre-existing source materials.
Moduletei
Used by
Memberscorr sic

Appendix A.2.19 model.phrase

model.phrase groups elements which can occur at the level of individual words or phrases.
Moduletei
Used by
Membersmodel.graphicLike model.highlighted[model.emphLike model.hiLike[hi]] model.lPart model.pPart.data[model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]] model.pPart.edit[model.pPart.editorial[choice] model.pPart.transcriptional[corr sic]] model.pPart.msdesc model.phrase.xml model.ptrLike model.segLike model.specDescLike
Note

This class of elements can occur within paragraphs, list items, lines of verse, etc.

Appendix A.2.20 model.placeStateLike

model.placeStateLike groups elements which describe changing states of a place.
Moduletei
Used by
Membersmodel.placeNamePart

Appendix A.2.21 model.qLike

model.qLike groups elements related to highlighting which can appear either within or between chunk-level elements.
Moduletei
Used by
Membersmodel.quoteLike

Appendix A.3 Attribute classes

Appendix A.3.1 att.breaking

att.breaking provides an attribute to indicate whether or not the element concerned is considered to mark the end of an orthographic token in the same way as whitespace.
Moduletei
Memberslb pb
AttributesAttributes
breakindicates whether or not the element bearing this attribute should be considered to mark the end of an orthographic token in the same way as whitespace.
StatusRecommended
Datatypeteidata.enumerated
Sample values include
yes
the element bearing this attribute is considered to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace
no
the element bearing this attribute is considered not to mark the end of any adjacent orthographic token irrespective of the presence of any adjacent whitespace
maybe
the encoding does not take any position on this issue.
In the following lines from the Dream of the Rood, linebreaks occur in the middle of the words lāðost and reord-berendum.
<ab> ...eƿesa tome iu icƿæs ȝeƿorden ƿita heardoſt . leodum la<lb break="no"/> ðost ærþan ichim lifes ƿeȝ rihtne ȝerymde reord be<lb break="no"/> rendum hƿæt me þaȝeƿeorðode ƿuldres ealdor ofer... </ab>

Appendix A.3.2 att.declaring

att.declaring provides attributes for elements which may be independently associated with a particular declarable element within the header, thus overriding the inherited default for that element.
Moduletei
Membersbody p text
AttributesAttributes
declsidentifies one or more declarable elements within the header, which are understood to apply to the element bearing this attribute and its content.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Note

The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 15.3. Associating Contextual Information with a Text.

Appendix A.3.3 att.editLike

att.editLike provides attributes describing the nature of an encoded scholarly intervention or interpretation of any kind.
Moduletei
Memberscorr
AttributesAttributes
instantindicates whether this is an instant revision or not.
StatusOptional
Datatypeteidata.xTruthValue
Defaultfalse
Note

The members of this attribute class are typically used to represent any kind of editorial intervention in a text, for example a correction or interpretation, or to date or localize manuscripts etc.

Each pointer on the source (if present) corresponding to a witness or witness group should reference a bibliographic citation such as a <witness>, <msDesc>, or <bibl> element, or another external bibliographic citation, documenting the source concerned.

Appendix A.3.4 att.edition

att.edition provides attributes identifying the source edition from which some encoded feature derives.
Moduletei
Memberslb pb
AttributesAttributes
ed(edition) supplies a sigil or other arbitrary identifier for the source edition in which the associated feature (for example, a page, column, or line break) occurs at this point in the text.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
edRef(edition reference) provides a pointer to the source edition in which the associated feature (for example, a page, column, or line break) occurs at this point in the text.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Example
<l>Of Mans First Disobedience,<lb ed="1674"/> and<lb ed="1667"/> the Fruit</l> <l>Of that Forbidden Tree, whose<lb ed="1667 1674"/> mortal tast</l> <l>Brought Death into the World,<lb ed="1667"/> and all<lb ed="1674"/> our woe,</l>
Example
<listBibl>  <bibl xml:id="stapledon1937">   <author>Olaf Stapledon</author>,  <title>Starmaker</title>, <publisher>Methuen</publisher>, <date>1937</date>  </bibl>  <bibl xml:id="stapledon1968">   <author>Olaf Stapledon</author>,  <title>Starmaker</title>, <publisher>Dover</publisher>, <date>1968</date>  </bibl> </listBibl> <p>Looking into the future aeons from the supreme moment of the cosmos, I saw the populations still with all their strength maintaining the<pb n="411edRef="#stapledon1968"/>essentials of their ancient culture, still living their personal lives in zest and endless novelty of action, … I saw myself still preserving, though with increasing difficulty, my lucid con-<pb n="291edRef="#stapledon1937"/>sciousness;</p>

Appendix A.3.5 att.global.change

att.global.change supplies the change attribute, allowing its member elements to specify one or more states or revision campaigns with which they are associated.
Moduletranscr
Membersatt.global[body choice corr fw hi lb p pb sic text]
AttributesAttributes
changepoints to one or more <change> elements documenting a state or revision campaign to which the element bearing this attribute and its children have been assigned by the encoder.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace

Appendix A.3.6 att.global.facs

att.global.facs provides an attribute used to express correspondence between an element containing transcribed text and all or part of an image representing that text.
Moduletranscr
Membersatt.global[body choice corr fw hi lb p pb sic text]
AttributesAttributes
facs(facsimile) points to all or part of an image which corresponds with the content of the element.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace

Appendix A.3.7 att.global.rendition

att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme.
Moduletei
Membersatt.global[body choice corr fw hi lb p pb sic text]
AttributesAttributes
rend(rendition) indicates how the element in question was rendered or presented in the source text.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
<head rend="align(center) case(allcaps)">  <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/>  <hi rend="case(mixed)">New Blazing-World</hi>. </head>
Note

These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines. The values of the rend attribute are a set of sequence-indeterminate individual tokens separated by whitespace.

stylecontains an expression in some formal style definition language which defines the rendering or presentation used for this element in the source text
StatusOptional
Datatypeteidata.text
<head style="text-align: center; font-variant: small-caps">  <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/>  <hi style="font-variant: normal">New Blazing-World</hi>. </head>
Note

Unlike the attribute values of rend, which uses whitespace as a separator, the style attribute may contain whitespace. This attribute is intended for recording inline stylistic information concerning the source, not any particular output.

The formal language in which values for this attribute are expressed may be specified using the <styleDefDecl> element in the TEI header.

If style and rendition are both present on an element, then style overrides or complements rendition. style should not be used in conjunction with rend, because the latter does not employ a formal style definition language.

renditionpoints to a description of the rendering or presentation used for this element in the source text.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
<head rendition="#ac #sc">  <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/>  <hi rendition="#normal">New Blazing-World</hi>. </head> <rendition xml:id="sc"  scheme="css">font-variant: small-caps</rendition> <rendition xml:id="normal"  scheme="css">font-variant: normal</rendition> <rendition xml:id="ac"  scheme="css">text-align: center</rendition>
Note

The rendition attribute is used in a very similar way to the class attribute defined for XHTML but with the important distinction that its function is to describe the appearance of the source text, not necessarily to determine how that text should be presented on screen or paper.

If rendition is used to refer to a style definition in a formal language like CSS, it is recommended that it not be used in conjunction with rend. Where both rendition and rend are supplied, the latter is understood to override or complement the former.

Each URI provided should indicate a <rendition> element defining the intended rendition in terms of some appropriate style language, as indicated by the scheme attribute.

Appendix A.3.8 att.global.responsibility

att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text, the markup or something asserted by the markup, and the degree of certainty associated with it.
Moduletei
Membersatt.global[body choice corr fw hi lb p pb sic text]
AttributesAttributes
cert(certainty) signifies the degree of certainty associated with the intervention or interpretation.
StatusOptional
Datatypeteidata.probCert
resp(responsible party) indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Note

To reduce the ambiguity of a resp pointing directly to a person or organization, we recommend that resp be used to point not to an agent (<person> or <org>) but to a <respStmt>, <author>, <editor> or similar element which clarifies the exact role played by the agent. Pointing to multiple <respStmt>s allows the encoder to specify clearly each of the roles played in part of a TEI file (creating, transcribing, encoding, editing, proofing etc.).

Example
Blessed are the <choice>  <sic>cheesemakers</sic>  <corr resp="#editorcert="high">peacemakers</corr> </choice>: for they shall be called the children of God.
Example
<lg>  <l>Punkes, Panders, baſe extortionizing    sla<choice>    <sic>n</sic>    <corr resp="#JENS1_transcriber">u</corr>   </choice>es,</l> </lg> <respStmt xml:id="JENS1_transcriber">  <resp when="2014">Transcriber</resp>  <name>Janelle Jenstad</name> </respStmt>

Appendix A.3.9 att.global.source

att.global.source provides an attribute used by elements to point to an external source.
Moduletei
Membersatt.global[body choice corr fw hi lb p pb sic text]
AttributesAttributes
sourcespecifies the source from which some aspect of this element is drawn.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Note

The source attribute points to an external source. When used on elements describing schema components such as <schemaSpec> or <moduleRef> it identifies the source from which declarations for the components of the object being defined may be obtained.

On other elements it provides a pointer to the bibliographical source from which a quotation or citation is drawn.

In either case, the location may be provided using any form of URI, for example an absolute URI, a relative URI, or private scheme URI that is expanded to an absolute URI as documented in a <prefixDef>.

If more than one location is specified, the default assumption is that the required source should be obtained by combining the resources indicated.

Example
<p> As Willard McCarty (<bibl xml:id="mcc_2012">2012, p.2</bibl>) tells us, <quote source="#mcc_2012">‘Collaboration’ is a problematic and should be a contested    term.</quote> </p>
Example
<p>  <quote source="#chicago_15_ed">Grammatical theories are in flux, and the more we learn, the    less we seem to know.</quote> </p> <bibl xml:id="chicago_15_ed">  <title level="m">The Chicago Manual of Style</title>, <edition>15th edition</edition>. <pubPlace>Chicago</pubPlace>: <publisher>University of    Chicago Press</publisher> (<date>2003</date>), <biblScope unit="page">p.147</biblScope>. </bibl>
Example
<elementRef key="psource="tei:2.0.1"/>
Include in the schema an element named <p> available from the TEI P5 2.0.1 release.
Example
<schemaSpec ident="myODD"  source="mycompiledODD.xml"/>
Create a schema using components taken from the file mycompiledODD.xml.

Appendix A.3.10 att.placement

att.placement provides attributes for describing where on the source page or object a textual element appears.
Moduletei
Membersfw
AttributesAttributes
placespecifies where this item is placed.
StatusRecommended
Datatype1–∞ occurrences of teidata.enumerated separated by whitespace
Suggested values include:
below
below the line
bottom
at the foot of the page
margin
in the margin (left, right, or both)
top
at the top of the page
opposite
on the opposite, i.e. facing, page
overleaf
on the other side of the leaf
above
above the line
end
at the end of e.g. chapter or volume.
inline
within the body of the text.
inspace
in a predefined space, for example left by an earlier scribe.
<add place="margin">[An addition written in the margin]</add> <add place="bottom opposite">[An addition written at the foot of the current page and also on the facing page]</add>
<note place="bottom">Ibid, p.7</note>

Appendix A.3.11 att.spanning

att.spanning provides attributes for elements which delimit a span of text by pointing mechanisms rather than by enclosing it.
Moduletei
Memberslb pb
AttributesAttributes
spanToindicates the end of a span initiated by the element bearing this attribute.
StatusOptional
Datatypeteidata.pointer
SchematronThe @spanTo attribute must point to an element following the current element
<sch:rule context="tei:*[@spanTo]"> <sch:assert test="id(substring(@spanTo,2)) and following::*[@xml:id=substring(current()/@spanTo,2)]">The element indicated by @spanTo (<sch:value-of select="@spanTo"/>) must follow the current element <sch:name/> </sch:assert> </sch:rule>
Note

The span is defined as running in document order from the start of the content of the pointing element to the end of the content of the element pointed to by the spanTo attribute (if any). If no value is supplied for the attribute, the assumption is that the span is coextensive with the pointing element. If no content is present, the assumption is that the starting point of the span is immediately following the element itself.

Appendix A.3.12 att.written

att.written provides an attribute to indicate the hand in which the content of an element was written in the source being transcribed.
Moduletei
Membersfw hi p text
AttributesAttributes
handpoints to a <handNote> element describing the hand considered responsible for the content of the element concerned.
StatusOptional
Datatypeteidata.pointer

Appendix A.4 Macros

Appendix A.4.1 macro.paraContent

macro.paraContent (paragraph content) defines the content of paragraphs and similar elements.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.inter"/>
  <classRef key="model.global"/>
  <elementRef key="lg"/>
  <classRef key="model.lLike"/>
 </alternate>
</content>
    
Declaration
macro.paraContent =
   (
      text
    | model.gLike
    | model.phrasemodel.intermodel.global
    | lg
    | model.lLike
   )*

Appendix A.4.2 macro.phraseSeq

macro.phraseSeq (phrase sequence) defines a sequence of character data and phrase-level elements.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <classRef key="model.gLike"/>
  <classRef key="model.qLike"/>
  <classRef key="model.phrase"/>
  <classRef key="model.global"/>
 </alternate>
</content>
    
Declaration
macro.phraseSeq =
   ( text | model.gLike | model.qLike | model.phrase | model.global )*

Appendix A.5 Datatypes

Appendix A.5.1 teidata.certainty

teidata.certainty defines the range of attribute values expressing a degree of certainty.
Moduletei
Used by
Content model
<content>
 <valList type="closed">
  <valItem ident="high"/>
  <valItem ident="medium"/>
  <valItem ident="low"/>
  <valItem ident="unknown"/>
 </valList>
</content>
    
Declaration
teidata.certainty = "high" | "medium" | "low" | "unknown"
Note

Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter.

Appendix A.5.2 teidata.enumerated

teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities.
Moduletei
Used by
Element:
  • p/@part
Content model
<content>
 <dataRef key="teidata.word"/>
</content>
    
Declaration
teidata.enumerated = teidata.word
Note

Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element.

Appendix A.5.3 teidata.language

teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <dataRef name="language"/>
  <valList>
   <valItem ident=""/>
  </valList>
 </alternate>
</content>
    
Declaration
teidata.language = xsd:language | ( "" )
Note

The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice.

A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.

language
The IANA-registered code for the language. This is almost always the same as the ISO 639 2-letter language code if there is one. The list of available registered language subtags can be found at http://www.iana.org/assignments/language-subtag-registry. It is recommended that this code be written in lower case.
script
The ISO 15924 code for the script. These codes consist of 4 letters, and it is recommended they be written with an initial capital, the other three letters in lower case. The canonical list of codes is maintained by the Unicode Consortium, and is available at http://unicode.org/iso15924/iso15924-codes.html. The IETF recommends this code be omitted unless it is necessary to make a distinction you need.
region
Either an ISO 3166 country code or a UN M.49 region code that is registered with IANA (not all such codes are registered, e.g. UN codes for economic groupings or codes for countries for which there is already an ISO 3166 2-letter code are not registered). The former consist of 2 letters, and it is recommended they be written in upper case; the list of codes can be searched or browsed at https://www.iso.org/obp/ui/#search/code/. The latter consist of 3 digits; the list of codes can be found at http://unstats.un.org/unsd/methods/m49/m49.htm.
variant
An IANA-registered variation. These codes are used to indicate additional, well-recognized variations that define a language or its dialects that are not covered by other available subtags.
extension
An extension has the format of a single letter followed by a hyphen followed by additional subtags. These exist to allow for future extension to BCP 47, but as of this writing no such extensions are in use.
private use
An extension that uses the initial subtag of the single letter x (i.e., starts with x-) has no meaning except as negotiated among the parties involved. These should be used with great care, since they interfere with the interoperability that use of RFC 4646 is intended to promote. In order for a document that makes use of these subtags to be TEI-conformant, a corresponding <language> element must be present in the TEI header.

There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications.

Second, an entire language tag can consist of only a private use subtag. These tags start with x-, and do not need to follow any further rules established by the IETF and endorsed by these Guidelines. Like all language tags that make use of private use subtags, the language in question must be documented in a corresponding <language> element in the TEI header.

Examples include

sn
Shona
zh-TW
Taiwanese
zh-Hant-HK
Chinese written in traditional script as used in Hong Kong
en-SL
English as spoken in Sierra Leone
pl
Polish
es-MX
Spanish as spoken in Mexico
es-419
Spanish as spoken in Latin America

The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML.

Appendix A.5.4 teidata.name

teidata.name defines the range of attribute values expressed as an XML Name.
Moduletei
Used by
Content model
<content>
 <dataRef name="Name"/>
</content>
    
Declaration
teidata.name = xsd:Name
Note

Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see http://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits.

Appendix A.5.5 teidata.numeric

teidata.numeric defines the range of attribute values used for numeric values.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <dataRef name="double"/>
  <dataRef name="token"
   restriction="(\-?[\d]+/\-?[\d]+)"/>
  <dataRef name="decimal"/>
 </alternate>
</content>
    
Declaration
teidata.numeric =
   xsd:double | token { pattern = "(\-?[\d]+/\-?[\d]+)" } | xsd:decimal
Note

Any numeric value, represented as a decimal number, in floating point format, or as a ratio.

To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3.

A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2.

Appendix A.5.6 teidata.pointer

teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere.
Moduletei
Used by
Content model
<content>
 <dataRef name="anyURI"/>
</content>
    
Declaration
teidata.pointer = xsd:anyURI
Note

The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, https://secure.wikimedia.org/wikipedia/en/wiki/% is encoded as https://secure.wikimedia.org/wikipedia/en/wiki/%25 while http://موقع.وزارة-الاتصالات.مصر/ is encoded as http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/

Appendix A.5.7 teidata.probCert

teidata.probCert defines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <dataRef key="teidata.probability"/>
  <dataRef key="teidata.certainty"/>
 </alternate>
</content>
    
Declaration
teidata.probCert = teidata.probability | teidata.certainty

Appendix A.5.8 teidata.probability

teidata.probability defines the range of attribute values expressing a probability.
Moduletei
Used by
Content model
<content>
 <dataRef name="double"/>
</content>
    
Declaration
teidata.probability = xsd:double
Note

Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true.

Appendix A.5.9 teidata.temporal.w3c

teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <dataRef name="date"/>
  <dataRef name="gYear"/>
  <dataRef name="gMonth"/>
  <dataRef name="gDay"/>
  <dataRef name="gYearMonth"/>
  <dataRef name="gMonthDay"/>
  <dataRef name="time"/>
  <dataRef name="dateTime"/>
 </alternate>
</content>
    
Declaration
teidata.temporal.w3c =
   xsd:date
 | xsd:gYear
 | xsd:gMonth
 | xsd:gDay
 | xsd:gYearMonth
 | xsd:gMonthDay
 | xsd:time
 | xsd:dateTime
Note

If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.

Appendix A.5.10 teidata.text

teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace.
Moduletei
Used by
Content model
<content>
 <dataRef name="string"/>
</content>
    
Declaration
teidata.text = string
Note

Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted.

Appendix A.5.11 teidata.truthValue

teidata.truthValue defines the range of attribute values used to express a truth value.
Moduletei
Used by
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Declaration
teidata.truthValue = xsd:boolean
Note

The possible values of this datatype are 1 or true, or 0 or false.

This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: data.xTruthValue.

Appendix A.5.12 teidata.word

teidata.word defines the range of attribute values expressed as a single word or token.
Moduletei
Used by
Content model
<content>
 <dataRef name="token"
  restriction="[^\p{C}\p{Z}]+"/>
</content>
    
Declaration
teidata.word = token { pattern = "[^\p{C}\p{Z}]+" }
Note

Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Appendix A.5.13 teidata.xTruthValue

teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown.
Moduletei
Used by
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <dataRef name="boolean"/>
  <valList>
   <valItem ident="unknown"/>
   <valItem ident="inapplicable"/>
  </valList>
 </alternate>
</content>
    
Declaration
teidata.xTruthValue = xsd:boolean | ( "unknown" | "inapplicable" )
Note

In cases where where uncertainty is inappropriate, use the datatype data.TruthValue.

Alexandre Bartz. Date: 2020-05-27