This document provides the specific encoding principles for manuscripts and printed documents of the project e-ditiones.
The project e-ditiones aims to encode severals 17th century French manuscripts and printed documents and, later, to present them on a digital library. We chose various literary genres, like drama, letters or novels.
Considering the fact that we have two major types of texts, as a reminder, printed documents and manuscripts, we made the decision to separate metadata from text. This way, we have the possibility to create two schemas, one specific to printed documents, the other to manuscripts.
Please note that we are still working on the best way to form a complete file.
Another essential principle of the project is to use a minimal set of elements.
One of our priorities is to clearly identify the text once encoded. We chose to give to each document an unique identifier consisting of the first three letters of the project, an underscore and a four digit serial number. So, the identifier of the first text encoded will be EDI_0001
.
For each other subdivision, such as chapter, act, speech, paragraph or line, you just have to concatenat the idenfier of the upper subdivision with a dash and a new number. For example, if the text encoded is a play, the identifier EDI_0001-1-3-4-5
can be understood as the fifth speech of the fourth scene, third act of the first play in the document with the identifier EDI_0001
.
If there is a front, such as a cast list, we decided to add a O
between the identifier of the upper subdivision and the new number. For example EDI_0001-0-1
indicates that this part is the first subdivision of the front from the document EDI_0001
. In this way, we can immediately know the position of a subvision in the text.
It might appear a little bit complicated but this method makes sure that every single line or paragraph can be clearly identify.
First, we chose to identify two differents types of headers :
Both headers are mostly the same : they contain a <fileDesc>, a <encodingDesc> and a <revisionDesc>. The only difference between the two of them is the addition of the <msDesc> used for the description of a manuscript.
This part of the header contains at least five other parts :
This part is essential for the presentation of the encoded document. It has to contain at least one <title> and one <author>.
This part contains the name of the editor and the date of encoding.
This part indicates the size of the work and contains the number of words and pages (considering that the number of pages equals the number of <pb>.
This part contains an <authority> element with the name of the project and an <availability> element with its status and the <licence>.
This part contains one (or more) bibliographical description wich includes standards TEI elements such as <author>, <title> or <date>.
As already said, there is a particularity when the text encoded is a manuscript. To describe the document, we have to use the <msDesc> element. To ensure a good encoding, severals elements are recommanded :
<msIdentifier> which contains informations used to properly identify the manuscript
<physDesc>which contains informations about the physical description of the document such as the <objectDesc> or the <bindingDesc>
<additional>which contains more informations about the document, such as <surrogates> or bibliographical informations (<bibl>)
This part describes the relationship between the encoded text and its source. It might contain :
This last part of the header contains informations about at least one <change> during the production of the document. when is used to specify the date of the event.
After the OCR of the text, its encoding will be completed in three phases :
Please note that at each level, all existing elements are still used and new elements are added to the existing ones.
The purpose of the first level is to distinguish between form and content. To do that, we chose to only use a few elements. First, at all levels our edition must contain a <text> element with the following namespace : @xmlns="http://www.tei-c.org/ns/1.0". It checks the validation of the TEI schema.
Then, at this level of encoding, all the text is included in the <body> and in a single <p>. Some informations are added at this point : concerning the content of the text, the element <fw> contains informations such as title, pagination or editor's notes. The other informations added are about the form of the text. We decided to employ the elements <pb> and <lb>. The first one, <pb>, which marks the point where a new page begins, is useful in the way that it can be used to check the transcription but also to compare our edition with a reference edition. The second one, <lb>, which marks the point where a new line begins, provides graphical informations and can be used for an automatic encoding process. It has two required attributes : break and rend. If a word is cut at the end of a line, break with the value "no" is useful in that the complete word can be establish again and be considere as a token. @rend shows which mark is used (a dash or an hyphen for example).
At this level of encoding, we add manually some semantic informations. Considering that we want to use, as mentioned before, a minimal set of elements, we decided to only employ common elements. Despite this, in the case of texts such as plays or letters, the use of a few specific elements is recommended.
It is possible to use the following elements :
Element | Text type | Note |
<front> | any prefatory matter | |
<div> | any text subdivision | type,n and xml:id are required |
<back> | any type of appendix | |
<head> | any type of heading | this can be used to clarify <fw> |
<list>and<item> | any type of list | n and xml:id are required |
<orgName>, <persName> and <placeName> | any type of person, place or organisation | this can be useful for entity search |
<l> and <lg> | any type of line or line group | |
<note> | any type of note | it can be used for a note by the autor, the editor or, rarely, added during the encoding |
There are only two exceptions, drama and letters.
If the text encoded is a play, it is allowed to use three new elements :
Element | Text type | Note |
<sp> | contains a speech | n and xml:id are required |
<stage> | any stage direction | e.g. useful to study spoken words |
<speaker> | any speaker in a speech |
Example of a letter
Example of a speech
If the text encoded is a letter, it is allowed to use two new elements :
Element | Text type | Note |
<opener> | any text at the start of a letter | e.g. a salutation or a dateline |
<closer> | any text at the end of a letter | e.g. a salutation or a dateline |
This level of encoding is automaticaly done. In order to add some linguistical informations, the original version of the text is normalized with the following elements : <choice>, <orig> and <reg>. Then, in order to process tokenization and lemmatization on the text, we decided to split it with <seg> and <w>. The first one, <seg> is used to represent any segmentation of the text. Note that sentences and clauses remain our basic units but we recommand to split a long sentence in several segments. The <w> is used to mark a single token. Regarding ponctuation, we decided to consider the marks as tokens; first, because more precision wouldn't be useful for our analyse and second, because with this choice, our encoding remains compatible with ELTeC.
We decided to define a closed of attributes that can be used for the encoding. There are only three of them :
Please note that all of them are required.
This attribute is used to identify the document or its subdivisions. Earlier in this document, we presented the way to properly generate identifiers.
xml:id is required on several elements and a diffetent levels :
This attribute is used to identify the numbering of its element from the second level. Node children elements are numbered incrementaly starting with 1.
Note that there are two exceptions :
Note: In this way, it's possible to compare our edition with an reference edition.
This attribute is used to specify the type of the current <div>.
Note that for this attribute, the use of predefined values is restricted.
Value | Usecase |
titlePage | in the <front>, used for the title page of the work |
privilege | in the <front>, used for the privilege of the work |
castList | in the <front>, used for the cast list |
liminal | in the <front>, used for any liminal part of the work |
play | used at the beginning of a new play |
act | used at the beginning of a new act |
scene | used at the beginning of a new scene |
part | used for any part of the work |
subPart | used for any subpart (child of a type="part") of the work |
letter | used for any letter |
collection | used for any type of collection |
<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure] | |
Module | textstructure |
Attributes | Attributes att.declaring (@decls) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Contained by | textstructure: text |
May contain | |
Example | <body>
<l>Nu scylun hergan hefaenricaes uard</l>
<l>metudæs maecti end his modgidanc</l>
<l>uerc uuldurfadur sue he uundra gihuaes</l>
<l>eci dryctin or astelidæ</l>
<l>he aerist scop aelda barnum</l>
<l>heben til hrofe haleg scepen.</l>
<l>tha middungeard moncynnæs uard</l>
<l>eci dryctin æfter tiadæ</l>
<l>firum foldu frea allmectig</l>
<trailer>primo cantauit Cædmon istud carmen.</trailer>
</body> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divTop"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divTop"/> </alternate> </sequence> <sequence minOccurs="0" maxOccurs="1"> <classRef key="model.divGenLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.common"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <alternate minOccurs="0" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.divLike"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> <sequence minOccurs="1" maxOccurs="unbounded"> <classRef key="model.div1Like"/> <alternate minOccurs="0" maxOccurs="unbounded"> <classRef key="model.global"/> <classRef key="model.divGenLike"/> </alternate> </sequence> </alternate> </sequence> </alternate> <sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottom"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </content> |
Schema Declaration | element body { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.declaring.attributes, ( model.global*, ( model.divTop, ( model.global | model.divTop )* )?, ( model.divGenLike, ( model.global | model.divGenLike )* )?, ( ( model.divLike, ( model.global | model.divGenLike )* )+ | ( model.div1Like, ( model.global | model.divGenLike )* )+ | ( ( model.common, model.global* )+, ( ( model.divLike, ( model.global | model.divGenLike )* )+ | ( model.div1Like, ( model.global | model.divGenLike )* )+ )? ) ), ( model.divBottom, model.global* )* ) } |
<choice> groups a number of alternative encodings for the same point in a text. [3.4. Simple Editorial Changes] | |
Module | core |
Attributes | Attributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Member of | |
Contained by | |
May contain | |
Note | Because the children of a <choice> element all represent alternative ways of encoding the same sequence, it is natural to think of them as mutually exclusive. However, there may be cases where a full representation of a text requires the alternative encodings to be considered as parallel. Note also that <choice> elements may self-nest. Where the purpose of an encoding is to record multiple witnesses of a single work, rather than to identify multiple possible encoding decisions at a given point, the <app> element and associated elements discussed in section 12.1. The Apparatus Entry, Readings, and Witnesses should be preferred. |
Example | An American encoding of Gulliver's Travels which retains the British spelling but also provides a version regularized to American spelling might be encoded as follows. <p>Lastly, That, upon his solemn oath to observe all the above
articles, the said man-mountain shall have a daily allowance of
meat and drink sufficient for the support of <choice>
<sic>1724</sic>
<corr>1728</corr>
</choice> of our subjects,
with free access to our royal person, and other marks of our
<choice>
<orig>favour</orig>
<reg>favor</reg>
</choice>.</p> |
Content model | <content> <alternate minOccurs="2" maxOccurs="unbounded"> <classRef key="model.choicePart"/> <elementRef key="choice"/> </alternate> </content> |
Schema Declaration | element choice { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, ( model.choicePart | choice )+ } |
<corr> (correction) contains the correct form of a passage apparently erroneous in the copy text. [3.4.1. Apparent Errors] | |
Module | core |
Attributes | Attributes att.editLike (@instant) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Member of | |
Contained by | |
May contain | |
Example | If all that is desired is to call attention to the fact that the copy text has been corrected, <corr> may be used alone: I don't know,
Juan. It's so far in the past now — how <corr>can we</corr> prove
or disprove anyone's theories? |
Example | It is also possible, using the <choice> and <sic> elements, to provide an uncorrected reading: I don't know, Juan. It's so far in the past now —
how <choice>
<sic>we can</sic>
<corr>can we</corr>
</choice> prove or
disprove anyone's theories? |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element corr { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.editLike.attributes, macro.paraContent } |
<fw> (forme work) contains a running head (e.g. a header, footer), catchword, or similar material appearing on the current page. [11.6. Headers, Footers, and Similar Matter] | |
Module | transcr |
Attributes | Attributes att.placement (@place) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Member of | |
Contained by | |
May contain | |
Note | Where running heads are consistent throughout a chapter or section, it is usually more convenient to relate them to the chapter or section, e.g. by use of the rend attribute. The <fw> element is intended for cases where the running head changes from page to page, or where details of page layout and the internal structure of the running heads are of paramount importance. |
Example | <fw type="sig" place="bottom">C3</fw> |
Content model | <content> <macroRef key="macro.phraseSeq"/> </content> |
Schema Declaration | element fw { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.placement.attributes, att.written.attributes, macro.phraseSeq } |
<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. [3.3.2.2. Emphatic Words and Phrases 3.3.2. Emphasis, Foreign Words, and Unusual Language] | |
Module | core |
Attributes | Attributes att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Member of | |
Contained by | |
May contain | |
Example | <hi rend="gothic">And this Indenture further witnesseth</hi>
that the said <hi rend="italic">Walter Shandy</hi>, merchant,
in consideration of the said intended marriage ... |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element hi { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.written.attributes, macro.paraContent } |
<lb> (line beginning) marks the beginning of a new (typographic) line in some edition or version of a text. [3.10.3. Milestone Elements 7.2.5. Speech Contents] | |
Module | core |
Attributes | Attributes att.edition (@ed, @edRef) att.spanning (@spanTo) att.breaking (@break) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Member of | |
Contained by | |
May contain | Empty element |
Note | By convention, <lb> elements should appear at the point in the text where a new line starts. The n attribute, if used, indicates the number or other value associated with the text between this point and the next <lb> element, typically the sequence number of the line within the page, or other appropriate unit. This element is intended to be used for marking actual line breaks on a manuscript or printed page, at the point where they occur; it should not be used to tag structural units such as lines of verse (for which the <l> element is available) except in circumstances where structural units cannot otherwise be marked. The type attribute may be used to characterize the line break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the line break is word-breaking, or to note the source from which it derives. |
Example | This example shows typographical line breaks within metrical lines, where they occur at different places in different editions: <l>Of Mans First Disobedience,<lb ed="1674"/> and<lb ed="1667"/> the Fruit</l>
<l>Of that Forbidden Tree, whose<lb ed="1667 1674"/> mortal tast</l>
<l>Brought Death into the World,<lb ed="1667"/> and all<lb ed="1674"/> our woe,</l> |
Example | This example encodes typographical line breaks as a means of preserving the visual appearance of a title page. The break attribute is used to show that the line break does not (as elsewhere) mark the start of a new word. <titlePart>
<lb/>With Additions, ne-<lb break="no"/>ver before Printed.
</titlePart> |
Content model | <content> <empty/> </content> |
Schema Declaration | element lb { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.edition.attributes, att.spanning.attributes, att.breaking.attributes, empty } |
<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents] | |||||||||||
Module | core | ||||||||||
Attributes | Attributes att.declaring (@decls) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp)
| ||||||||||
Member of | |||||||||||
Contained by | textstructure: body | ||||||||||
May contain | |||||||||||
Example | <p>Hallgerd was outside. <q>There is blood on your axe,</q> she said. <q>What have you
done?</q>
</p>
<p>
<q>I have now arranged that you can be married a second time,</q> replied Thjostolf.
</p>
<p>
<q>Then you must mean that Thorvald is dead,</q> she said.
</p>
<p>
<q>Yes,</q> said Thjostolf. <q>And now you must think up some plan for me.</q>
</p> | ||||||||||
Schematron |
<s:report test="not(ancestor::tei:floatingText) and (ancestor::tei:p or ancestor::tei:ab)
and not(parent::tei:exemplum |parent::tei:item |parent::tei:note |parent::tei:q
|parent::tei:quote |parent::tei:remarks |parent::tei:said |parent::tei:sp
|parent::tei:stage |parent::tei:cell |parent::tei:figure )"> Abstract model violation: Paragraphs may not occur inside other paragraphs or ab elements.
</s:report> | ||||||||||
Schematron |
<s:report test="ancestor::tei:l[not(.//tei:note//tei:p[. = current()])]"> Abstract model violation: Lines may not contain higher-level structural elements such as div, p, or ab.
</s:report> | ||||||||||
Content model | <content> <macroRef key="macro.paraContent"/> </content> | ||||||||||
Schema Declaration | element p { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.declaring.attributes, att.written.attributes, attribute part { "N" }?, macro.paraContent } |
<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.10.3. Milestone Elements] | |
Module | core |
Attributes | Attributes att.edition (@ed, @edRef) att.spanning (@spanTo) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.facs (@facs) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Member of | |
Contained by | |
May contain | Empty element |
Note | A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself. The type attribute may be used to characterize the page break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the page break is word-breaking, or to note the source from which it derives. |
Example | Page numbers may vary in different editions of a text. <p> ... <pb n="145" ed="ed2"/>
<!-- Page 145 in edition "ed2" starts here --> ... <pb n="283" ed="ed1"/>
<!-- Page 283 in edition "ed1" starts here--> ... </p> |
Example | A page break may be associated with a facsimile image of the page it introduces by means of the facs attribute <body>
<pb n="1" facs="page1.png"/>
<!-- page1.png contains an image of the page;
the text it contains is encoded here -->
<p>
<!-- ... -->
</p>
<pb n="2" facs="page2.png"/>
<!-- similarly, for page 2 -->
<p>
<!-- ... -->
</p>
</body> |
Content model | <content> <empty/> </content> |
Schema Declaration | element pb { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.facs.attribute.facs, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.edition.attributes, att.spanning.attributes, empty } |
<sic> (Latin for thus or so) contains text reproduced although apparently incorrect or inaccurate. [3.4.1. Apparent Errors] | |
Module | core |
Attributes | Attributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Member of | |
Contained by | |
May contain | |
Example | for his nose was as sharp as
a pen, and <sic>a Table</sic> of green fields. |
Example | If all that is desired is to call attention to the apparent problem in the copy text, <sic> may be used alone: I don't know, Juan. It's so far in the past now
— how <sic>we can</sic> prove or disprove anyone's theories? |
Example | It is also possible, using the <choice> and <corr> elements, to provide a corrected reading: I don't know, Juan. It's so far in the past now
— how <choice>
<sic>we can</sic>
<corr>can we</corr>
</choice> prove or disprove anyone's theories? |
Example | for his nose was as sharp as
a pen, and <choice>
<sic>a Table</sic>
<corr>a' babbld</corr>
</choice> of green fields. |
Content model | <content> <macroRef key="macro.paraContent"/> </content> |
Schema Declaration | element sic { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, macro.paraContent } |
<text> contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text] | |
Module | textstructure |
Attributes | Attributes att.declaring (@decls) att.written (@hand) att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) att.global.rendition (@rend, @style, @rendition) att.global.change (@change) att.global.responsibility (@cert, @resp) |
Contained by | — |
May contain | |
Note | This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose. |
Example | <text>
<front>
<docTitle>
<titlePart>Autumn Haze</titlePart>
</docTitle>
</front>
<body>
<l>Is it a dragonfly or a maple leaf</l>
<l>That settles softly down upon the water?</l>
</body>
</text> |
Example | The body of a text may be replaced by a group of nested texts, as in the following schematic: <text>
<front>
<!-- front matter for the whole group -->
</front>
<group>
<text>
<!-- first text -->
</text>
<text>
<!-- second text -->
</text>
</group>
</text> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <elementRef key="front"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="body"/> <elementRef key="group"/> </alternate> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> <sequence minOccurs="0" maxOccurs="1"> <elementRef key="back"/> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> </sequence> </content> |
Schema Declaration | element text { att.global.attribute.xmlid, att.global.attribute.n, att.global.attribute.xmllang, att.global.attribute.xmlbase, att.global.attribute.xmlspace, att.global.rendition.attribute.rend, att.global.rendition.attribute.style, att.global.rendition.attribute.rendition, att.global.change.attribute.change, att.global.responsibility.attribute.cert, att.global.responsibility.attribute.resp, att.declaring.attributes, att.written.attributes, ( model.global*, ( front, model.global* )?, ( body | group ), model.global*, ( back, model.global* )? ) } |
model.common groups common chunk- and inter-level elements. | |
Module | tei |
Used by | |
Members | model.divPart[model.lLike model.pLike[p]] model.inter[model.biblLike model.egLike model.labelLike model.listLike model.oddDecl model.qLike[model.quoteLike] model.stageLike] |
Note | This class defines the set of chunk- and inter-level elements; it is used in many content models, including those for textual divisions. |
model.divBottom groups elements appearing at the end of a text division. | |
Module | tei |
Used by | |
Members | model.divBottomPart model.divWrapper |
model.divPart groups paragraph-level elements appearing directly within divisions. | |
Module | tei |
Used by | |
Members | model.lLike model.pLike[p] |
Note | Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items. |
model.divTop groups elements appearing at the beginning of a text division. | |
Module | tei |
Used by | |
Members | model.divTopPart[model.headLike] model.divWrapper |
model.divTopPart groups elements which can occur only at the beginning of a text division. | |
Module | tei |
Used by | |
Members | model.headLike |
model.global groups elements which may appear at any point within a TEI text. | |
Module | tei |
Used by | |
Members | model.global.edit model.global.meta model.milestoneLike[fw lb pb] model.noteLike |
model.hiLike groups phrase-level elements which are typographically distinct but to which no specific function can be attributed. | |
Module | tei |
Used by | |
Members | hi |
model.highlighted groups phrase-level elements which are typographically distinct. | |
Module | tei |
Used by | |
Members | model.emphLike model.hiLike[hi] |
model.inter groups elements which can appear either within or between paragraph-like elements. | |
Module | tei |
Used by | |
Members | model.biblLike model.egLike model.labelLike model.listLike model.oddDecl model.qLike[model.quoteLike] model.stageLike |
model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources. | |
Module | tei |
Used by | |
Members | model.emphLike model.hiLike[hi] model.pPart.data[model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]] model.pPart.editorial[choice] model.pPart.msdesc model.phrase.xml model.ptrLike |
model.nameLike groups elements which name or refer to a person, place, or organization. | |
Module | tei |
Used by | |
Members | model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart] |
Note | A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc. |
model.pLike groups paragraph-like elements. | |
Module | tei |
Used by | |
Members | p |
model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data. | |
Module | tei |
Used by | |
Members | model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]] |
model.pPart.edit groups phrase-level elements for simple editorial correction and transcription. | |
Module | tei |
Used by | |
Members | model.pPart.editorial[choice] model.pPart.transcriptional[corr sic] |
model.pPart.editorial groups phrase-level elements for simple editorial interventions that may be useful both in transcribing and in authoring. | |
Module | tei |
Used by | |
Members | choice |
model.phrase groups elements which can occur at the level of individual words or phrases. | |
Module | tei |
Used by | |
Members | model.graphicLike model.highlighted[model.emphLike model.hiLike[hi]] model.lPart model.pPart.data[model.addressLike model.dateLike model.measureLike model.nameLike[model.nameLike.agent model.offsetLike model.placeStateLike[model.placeNamePart]]] model.pPart.edit[model.pPart.editorial[choice] model.pPart.transcriptional[corr sic]] model.pPart.msdesc model.phrase.xml model.ptrLike model.segLike model.specDescLike |
Note | This class of elements can occur within paragraphs, list items, lines of verse, etc. |
model.placeStateLike groups elements which describe changing states of a place. | |
Module | tei |
Used by | |
Members | model.placeNamePart |
model.qLike groups elements related to highlighting which can appear either within or between chunk-level elements. | |
Module | tei |
Used by | |
Members | model.quoteLike |
att.breaking provides an attribute to indicate whether or not the element concerned is considered to mark the end of an orthographic token in the same way as whitespace. | |||||||||||
Module | tei | ||||||||||
Members | lb pb | ||||||||||
Attributes | Attributes
|
att.declaring provides attributes for elements which may be independently associated with a particular declarable element within the header, thus overriding the inherited default for that element. | |||||||
Module | tei | ||||||
Members | body p text | ||||||
Attributes | Attributes
| ||||||
Note | The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 15.3. Associating Contextual Information with a Text. |
att.editLike provides attributes describing the nature of an encoded scholarly intervention or interpretation of any kind. | |||||||||
Module | tei | ||||||||
Members | corr | ||||||||
Attributes | Attributes
| ||||||||
Note | The members of this attribute class are typically used to represent any kind of editorial intervention in a text, for example a correction or interpretation, or to date or localize manuscripts etc. Each pointer on the source (if present) corresponding to a witness or witness group should reference a bibliographic citation such as a <witness>, <msDesc>, or <bibl> element, or another external bibliographic citation, documenting the source concerned. |
att.edition provides attributes identifying the source edition from which some encoded feature derives. | |||||||||||||
Module | tei | ||||||||||||
Members | lb pb | ||||||||||||
Attributes | Attributes
| ||||||||||||
Example | <l>Of Mans First Disobedience,<lb ed="1674"/> and<lb ed="1667"/> the Fruit</l>
<l>Of that Forbidden Tree, whose<lb ed="1667 1674"/> mortal tast</l>
<l>Brought Death into the World,<lb ed="1667"/> and all<lb ed="1674"/> our woe,</l> | ||||||||||||
Example | <listBibl>
<bibl xml:id="stapledon1937">
<author>Olaf Stapledon</author>,
<title>Starmaker</title>, <publisher>Methuen</publisher>, <date>1937</date>
</bibl>
<bibl xml:id="stapledon1968">
<author>Olaf Stapledon</author>,
<title>Starmaker</title>, <publisher>Dover</publisher>, <date>1968</date>
</bibl>
</listBibl>
<p>Looking into the future aeons from the supreme moment of
the cosmos, I saw the populations still with all their
strength maintaining the<pb n="411" edRef="#stapledon1968"/>essentials of their ancient culture,
still living their personal lives in zest and endless
novelty of action, … I saw myself still
preserving, though with increasing difficulty, my lucid
con-<pb n="291" edRef="#stapledon1937"/>sciousness;</p> |
att.global.change supplies the change attribute, allowing its member elements to specify one or more states or revision campaigns with which they are associated. | |||||||
Module | transcr | ||||||
Members | att.global[body choice corr fw hi lb p pb sic text] | ||||||
Attributes | Attributes
|
att.global.facs provides an attribute used to express correspondence between an element containing transcribed text and all or part of an image representing that text. | |||||||
Module | transcr | ||||||
Members | att.global[body choice corr fw hi lb p pb sic text] | ||||||
Attributes | Attributes
|
att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme. | |||||||||||||||||||||||||||||||
Module | tei | ||||||||||||||||||||||||||||||
Members | att.global[body choice corr fw hi lb p pb sic text] | ||||||||||||||||||||||||||||||
Attributes | Attributes
|
att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text, the markup or something asserted by the markup, and the degree of certainty associated with it. | |||||||||||||||
Module | tei | ||||||||||||||
Members | att.global[body choice corr fw hi lb p pb sic text] | ||||||||||||||
Attributes | Attributes
| ||||||||||||||
Example | Blessed are the
<choice>
<sic>cheesemakers</sic>
<corr resp="#editor" cert="high">peacemakers</corr>
</choice>: for they shall be called the children of God. | ||||||||||||||
Example | <lg>
<l>Punkes, Panders, baſe extortionizing
sla<choice>
<sic>n</sic>
<corr resp="#JENS1_transcriber">u</corr>
</choice>es,</l>
</lg>
<respStmt xml:id="JENS1_transcriber">
<resp when="2014">Transcriber</resp>
<name>Janelle Jenstad</name>
</respStmt> |
att.global.source provides an attribute used by elements to point to an external source. | |||||||||
Module | tei | ||||||||
Members | att.global[body choice corr fw hi lb p pb sic text] | ||||||||
Attributes | Attributes
| ||||||||
Example | <p> As Willard McCarty (<bibl xml:id="mcc_2012">2012, p.2</bibl>) tells us, <quote source="#mcc_2012">‘Collaboration’ is a problematic and should be a contested
term.</quote>
</p> | ||||||||
Example | <p>
<quote source="#chicago_15_ed">Grammatical theories are in flux, and the more we learn, the
less we seem to know.</quote>
</p>
<bibl xml:id="chicago_15_ed">
<title level="m">The Chicago Manual of Style</title>,
<edition>15th edition</edition>. <pubPlace>Chicago</pubPlace>: <publisher>University of
Chicago Press</publisher> (<date>2003</date>), <biblScope unit="page">p.147</biblScope>.
</bibl> | ||||||||
Example | <elementRef key="p" source="tei:2.0.1"/> Include in the schema an element named <p> available from the TEI P5 2.0.1 release. | ||||||||
Example | <schemaSpec ident="myODD"
source="mycompiledODD.xml"/> Create a schema using components taken from the file mycompiledODD.xml. |
att.placement provides attributes for describing where on the source page or object a textual element appears. | |||||||||||||
Module | tei | ||||||||||||
Members | fw | ||||||||||||
Attributes | Attributes
|
att.spanning provides attributes for elements which delimit a span of text by pointing mechanisms rather than by enclosing it. | |||||||||
Module | tei | ||||||||
Members | lb pb | ||||||||
Attributes | Attributes
| ||||||||
Note | The span is defined as running in document order from the start of the content of the pointing element to the end of the content of the element pointed to by the spanTo attribute (if any). If no value is supplied for the attribute, the assumption is that the span is coextensive with the pointing element. If no content is present, the assumption is that the starting point of the span is immediately following the element itself. |
att.written provides an attribute to indicate the hand in which the content of an element was written in the source being transcribed. | |||||||
Module | tei | ||||||
Members | fw hi p text | ||||||
Attributes | Attributes
|
macro.paraContent (paragraph content) defines the content of paragraphs and similar elements. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.global"/> <elementRef key="lg"/> <classRef key="model.lLike"/> </alternate> </content> |
Declaration | macro.paraContent = ( text | model.gLike | model.phrase | model.inter | model.global | lg | model.lLike )* |
macro.phraseSeq (phrase sequence) defines a sequence of character data and phrase-level elements. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.qLike"/> <classRef key="model.phrase"/> <classRef key="model.global"/> </alternate> </content> |
Declaration | macro.phraseSeq = ( text | model.gLike | model.qLike | model.phrase | model.global )* |
teidata.certainty defines the range of attribute values expressing a degree of certainty. | |
Module | tei |
Used by | |
Content model | <content> <valList type="closed"> <valItem ident="high"/> <valItem ident="medium"/> <valItem ident="low"/> <valItem ident="unknown"/> </valList> </content> |
Declaration | teidata.certainty = "high" | "medium" | "low" | "unknown" |
Note | Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter. |
teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities. | |
Module | tei |
Used by | Element:
|
Content model | <content> <dataRef key="teidata.word"/> </content> |
Declaration | teidata.enumerated = teidata.word |
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element. |
teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="language"/> <valList> <valItem ident=""/> </valList> </alternate> </content> |
Declaration | teidata.language = xsd:language | ( "" ) |
Note | The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice. A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.
There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications. Second, an entire language tag can consist of only a private use subtag. These tags start with Examples include
The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML. |
teidata.name defines the range of attribute values expressed as an XML Name. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="Name"/> </content> |
Declaration | teidata.name = xsd:Name |
Note | Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see http://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits. |
teidata.numeric defines the range of attribute values used for numeric values. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="double"/> <dataRef name="token" restriction="(\-?[\d]+/\-?[\d]+)"/> <dataRef name="decimal"/> </alternate> </content> |
Declaration | teidata.numeric = xsd:double | token { pattern = "(\-?[\d]+/\-?[\d]+)" } | xsd:decimal |
Note | Any numeric value, represented as a decimal number, in floating point format, or as a ratio. To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3. A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2. |
teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="anyURI"/> </content> |
Declaration | teidata.pointer = xsd:anyURI |
Note | The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, |
teidata.probCert defines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <dataRef key="teidata.probability"/> <dataRef key="teidata.certainty"/> </alternate> </content> |
Declaration | teidata.probCert = teidata.probability | teidata.certainty |
teidata.probability defines the range of attribute values expressing a probability. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="double"/> </content> |
Declaration | teidata.probability = xsd:double |
Note | Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true. |
teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="date"/> <dataRef name="gYear"/> <dataRef name="gMonth"/> <dataRef name="gDay"/> <dataRef name="gYearMonth"/> <dataRef name="gMonthDay"/> <dataRef name="time"/> <dataRef name="dateTime"/> </alternate> </content> |
Declaration | teidata.temporal.w3c = xsd:date | xsd:gYear | xsd:gMonth | xsd:gDay | xsd:gYearMonth | xsd:gMonthDay | xsd:time | xsd:dateTime |
Note | If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used. |
teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="string"/> </content> |
Declaration | teidata.text = string |
Note | Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted. |
teidata.truthValue defines the range of attribute values used to express a truth value. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="boolean"/> </content> |
Declaration | teidata.truthValue = xsd:boolean |
Note | The possible values of this datatype are 1 or true, or 0 or false. This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: data.xTruthValue. |
teidata.word defines the range of attribute values expressed as a single word or token. | |
Module | tei |
Used by | |
Content model | <content> <dataRef name="token" restriction="[^\p{C}\p{Z}]+"/> </content> |
Declaration | teidata.word = token { pattern = "[^\p{C}\p{Z}]+" } |
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. |
teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown. | |
Module | tei |
Used by | |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <dataRef name="boolean"/> <valList> <valItem ident="unknown"/> <valItem ident="inapplicable"/> </valList> </alternate> </content> |
Declaration | teidata.xTruthValue = xsd:boolean | ( "unknown" | "inapplicable" ) |
Note | In cases where where uncertainty is inappropriate, use the datatype data.TruthValue. |