This document is meant to serve as a reference for the encoding of ParlaMint corpora of parliamentary proceedings. In order for the ParlaMint corpora to be interoperable (i.e. so that the same scripts can be used to process them), their structure is fairly rigid, both in terms of file names and folder structure, as well as their TEI XML encoding. This is not to say that all the corpora have to contain exactly the same information because we distinguish obligatory information, which all the corpora should contain, from that which is optional, and present only in the corpora for which it has been possible to gather it from the corpus sources.
This document is a specialisation of Parla-CLARIN, itself a customisation the TEI Guidelines. But while Parla-CLARIN gives fairly general recommendations for encoding corpora of parliamentary proceedings, ParlaMint, as mentioned, is much stricter. This document gives very specific encoding recommendations without necessarily stating the reasons for their choice. It covers the overall structure of ParlaMint corpora, the metadata they contain, the encoding of transcriptions, and, for the linguistically annotated version, the encoding of word-level linguistic annotatios, syntactic dependencies and named entities.
The document is not meant as a tutorial on TEI or ParlaMint, but as a reference to elements, their nesting and attributes exemplified by snippets from the existing ParlaMint corpora. Other sources can help in understanding the encoding of ParlaMint corpora:
The rest of these recommendations are structured as follows:
The <teiHeader> of a corpus component (further detailed in the Section on Corpus metadata) contains the metadata specific for this component (along with some redundant metadata about the provenance), and which should be unique in the corpus, i.e. the corpus component metadata should distinguish it from all the other components of the corpus.
The fact that a corpus is one XML document does not mean that it is also stored in one file. In fact, ParlaMint requires that each corpus component is stored in a separate file, with the corpus root, i.e. the top-level <teiCorpus>, also stored as one file. Furthermore, some parts of the corpus root metadata are also stored in separate files.
Apart from corpus components, some parts of the overall corpus metadata (i.e. the <teiCorpus> <teiHeader> element) are also stored as separate files, and hence also included in the corpus root using the same XInclude mechanism as explained above.
ParlaMint has strict rules on how to name the various files that constitute a corpus, and how to collect them in directories.
The file names have the the following structure:
ParlaMint-
, followed by the ISO 3166 country (or automous region) code (cf. Section on Standard values) e.g. ParlaMint-NL.xml
or ParlaMint-ES-CT
.ParlaMint-NL-en.xml
.ParlaMint-IS_2015-01-21-54.xml
. In case a corpus component is further distinguished, so that there are are several components with the same date, the corpus compilers are free to extend the file name by a hyphen and any suffix containing only ASCII letters and numbers and the hyphen character, e.g. ParlaMint-NL_2018-10-30-eerstekamer-4.xml
or ParlaMint-CZ_2016-04-13-ps2013-044-02-016-098.xml
ParlaMint-BE-listPerson.xml
. Where there are more files for instances of the same element name, as is the case for taxonomies, the filename should end with another hypen, followed by the ID of the particular element, e.g. ParlaMint-BE-taxonomy-UD-SYN.xml
. Finally, some of the taxonomies are not corpus-specific, i.e. identical files are used by all ParlaMint corpora. In this case, the country or region code is ommitted, e.g. ParlaMint-taxonomy-parla.legislature.xml
.ParlaMint-IS_2015-01-21-54.txt
; this is further explained in the Section on Conversions..ana
on the corpus root and components, e.g. ParlaMint-ES-CT.ana.xml
or ParlaMint-IS_2015-01-21-54.ana.xml.
For distribution the complete XML corpus should be stored in a directory that has the same name prefix as the corpus root file. The directory then contains the corpus root file and its metadata files, while the corpus components should be in subdirectories, one per year, for example:
ParlaMint-BE.TEI/ParlaMint-BE.xml
ParlaMint-BE.TEI/ParlaMint-BE-listPerson.xml
ParlaMint-BE.TEI/ParlaMint-BE-listOrg.xml
ParlaMint-BE.TEI/ParlaMint-taxonomy-parla.legislature.xml
ParlaMint-BE.TEI/ParlaMint-taxonomy-speaker_types.xml
...
ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-06-19.xml
ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-06-30.xml
ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-07-17.xml
...
ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-06-54.xml
ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-07-54.xml
ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-08-54.xml
...
⚓
The lingistically annotated version of the corpus is stored separately, with the main directory and, as mentioned, the corpus root and component filenames having the additional suffix .ana
, e.g.
ParlaMint-BE.TEI.ana/ParlaMint-BE.ana.xml
ParlaMint-BE.TEI.ana/ParlaMint-BE-listPerson.xml
ParlaMint-BE.TEI.ana/ParlaMint-BE-listOrg.xml
ParlaMint-BE.TEI.ana/ParlaMint-taxonomy-parla.legislature.xml
ParlaMint-BE.TEI.ana/ParlaMint-taxonomy-speaker_types.xml
ParlaMint-taxonomy-NER.xml
ParlaMint-taxonomy-UD.xml
...
ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-06-19.ana.xml
ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-06-30.ana.xml
ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-07-17.ana.xml
...
ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-06-54.ana.xml
ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-07-54.ana.xml
ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-08-54.ana.xml
...
⚓
This section gives some general requirements a ParlaMint corpus has to meet, in particular those relating to the characters in a corpus, and the use of standards. It also details the structure of the file names of the ParlaMint root and component files, as well as the attributes expected on the <teiCorpus> and <TEI> tags.
The corpus should be encoded in Unicode, using the UTF-8 character encoding, at least for European languages. In cases where the original contains characters from the Unicode Private Use Area, these should, if possible, be given their closest Unicode equivalents or substituted by the Unicode replacement character U+FFFD. End-of-line hyphens, if present in the source files, should be removed, and the split words joined in order to enhance searching the corpus and to simplify linguistic processing.
The following characters, esp. prevalent when the source documents were in Word or HTML, deserve special mention:
Text-bearing elements should also not start or end with space characters, and sequences of whitespace characters should be changed into a single space.
Whenever possible, ParlaMint uses standards for information coding. In particular, the following information must be standardised:
The Chapter on Overall corpus structure introduced the top level elements of the corpus root file and of the component files (i.e. the <teiCorpus> and <TEI> elements), but did not elaborate on their attributes; these are presented in this section.
The ParlaMint encoding uses pointing attributes for a number of purposes, e.g. for references to taxonomy categories, to speaker metadata, or to linguistic categories.
While a few elements have dedicated pointing attributes, there are three generally used ones. They share the characteristics that they are all used by a large number of different elements and that their value is a series of pointers, i.e. a white-space delimited sequence of references to the values of some xml:id attribute in the corpus or, in general, to an URI. The three attributes are:
It is often difficult to decide which of the attribute to use for a particular pointer, therefore examples of usage given with the relevant element should be always consulted.
ParlaMint makes a lot of use of temporal information, e.g. to determine when a session took place or the period when a certain person was an MP. As mentioned in the Section on Standard values, the ISO 8601 format should be used to specify the dates or times.
The following attributes are used to specify temporal information:
As mentioned, <teiCorpus> and <TEI> elements contain the obligatory <teiHeader> element, which stores the metadata to the corpus root or component. In this section we explain and give examples of the required and optional metadata that is contained in the <teiHeader>, proceeding through its various elements, and there distinguishing which parts and what content is appropriate for the corpus root, and which for a corpus component.
As a general remark, most metadata contains free text, and it is a requirement of ParlaMint that this data is given in the English language, to help researchers for other countries to understand it, and it is recommended to also give it in the local language in which the (main portion of) parliamentary transcripts is written, for a local researcher to be able to use it in their native tongue.
The title statement, <titleStmt> gives the title of the corpus root or component, along with the specification of the particular session(s) of the parliament contained, the persons responsible for compiling the corpus, and the funder(s) of the project.
The main title has a formulaic structure ‘<Country name> parliamentary corpus ParlaMint-<Country code> [ParlaMint]’, with an equivalent structure for the local language. Note that the corpus ‘stamp’ in square brackets can also be ‘[ParlaMint.ana]’ for the linguistically annotated version of the corpus (as explained in the Chapter on Linguistic annotation) or ‘[ParlaMint SAMPLE]’ for corpus data samples, as available on the ParlaMint GitHub repository.
The subordinate title, in contrast to the main one, is free text, and usually formed on the basis of the source of the corpus. As with the main one, it should be given in both languages.
After the titles come the specification of the particular sessions that the corpus contains, encoded as <meeting> elements: the two meeting elements in the above example state that the ParlaMint-SI corpus contains the meetings of the 7th and 8th terms of the lower house of the National Assembly of the Republic of Slovenia. The <meeting> elements can give, as the value of their n attribute, the numbers of the meetings that the corpus covers, and their text content can give a free-text description of the meetings in the local language.
The formal information on the meetings is given in the values of the corresp and ana attributes, which are pointing attributes, as already explained in the Section on Attributes of top-level elements. Here they refer to the definition of organisations further explained in the Section on Organisations and the categories of taxonomy elements, further explained in the Section on the Class declaration. The value of the corresp attribute points to the governmental body of which a particular meeting element is a meeting of (in this case the National Assembly of the Republic of Slovenia), while the ana attribute contains a space-delimited sequence of pointers: #parla.lower points to the definition of the lower house, #parla.term to the definition of a parliamentary term, and #DZ.7 to the definition of the seventh mandate.
Next come one or more responsibility statements, <respStmt>, each one containing one or more person names, <persName>, with an optional ref attribute, giving the URL, where more information about the person can be found, and the responsibility element <resp>, which specifies what responsibility the statement is about.
In a similar manner, the <funder> elements give information on the organisations which have financially contributed to the compilation of the corpus, with the names of the organisations given in the <orgName> elements.
The other difference is in the <meeting> elements, which here specify a particular meeting of the corpus component transcription. In the exmple above, this is an extraordinary meeting of the lower house in the seventh term of the National Assembly of the Republic of Slovenia.
It should be noted that both sizes are somewhat complex to compute and are inserted into the TEI headers in the finalisation of a corpus (cf. the Section on Finalisation of corpora) by a common script, so it is not necessary to insert the extent in the process of developing a ParlaMint corpus.
As the example shows, the recording statement contains a <recording> element, which specifies the type of the recording (audio or video), and then contains a <media> element giving the ID of the file, its mimeType, the URL of the source of the recording (typically the official governmental site for parliamentary proceedings) and the local (possibly processed) copy of the file; this can be a local file, even though it won't be distributed together with the ParlaMint corpus or, better, a Web-based file on a stable location.
In contrast, the encoding description of a corpus component contains only two elements, namely (and redundantly) the <projectDesc> and the <tagsDecl>.
The class declaration, <classDecl> is used only in the corpus root and contains only definitions of some controlled vocabularies used in ParlaMint corpora. These vocabularies, possibly hierarchically organised, are encoded using the <taxonomy> element.
CZ
(followed by hyphen) into the filename.ParlaMint requires several taxonomies to be defined in the class declaration of the corpus root (as well as a additionaly ones for the linguistically annotated corpus, as further described in the Section on Linguistic metadata). As mentioned, these taxonomies are defined globally and available as part of the data on the ParlaMint GitHub repository, and there is a special procedure modifying them, in particular on how to insert translations of a new language.
The five obligatory taxonomies are:
Furhtermore, there are two obligatory taxonomies which pertain to the linguistically analysed version of the corpus only, cf. the Section on Linguistic taxonomies.
Given the importance that ParlaMint gives to the information on speakers and their affiliations to (political) organisations, as well as the richness of this information, the content of <listOrg> and <listPerson> is further explained in a separate Chapter on Speakers and their organisations.
ParlaMint places considerable emphasis of including in the corpora significant information about the persons giving the speeches contained in the transcriptions. This is why, even though this information is encoded in the <particDesc> element of the <teiHeader> of the corpus root (cf. the Chapter on Participant description) we treat it here in a separate Chapter. Below we first discuss the information on persons, including how they are affiliated with (political) organisation, and then explain the encoding of these organisations.
The person must also have the <sex> element, with the value attribute being one of the controlled values: M for male, F for female, O for other, N for none or U for unknown.
The example above also shows the use of the (optional) classification attribute ana, which points to the specification of the legislative period in which the person was affiliated with the specified organisation. Such legislative periods are typically given as <event> elements inside the government or parliament organisations, as futher explained in the Sections on the government and parliament organisations.
The affiliation element can also have the usual from and to attributes, i.e. from and to when the person was affiliated with the organisation.
The role attribute can have as its value one of the values given by the ParlaMint schema. For backward compatibility with ParlaMint I corpora, there are some roles that are only used by one corpus (cf. the definition of role for <affiliation>), but the main ones that should be used are:
It should be noted that ParlaMint makes no assumptions on the connection between various roles, e.g. we do not assume that if somebody has a minister role in the government that they are also a member of the government. Therefore it is necessary to specify all the desired affiliations with their particular roles, e.g. both as minister and as member.
It is important to give correct roles to the affiliations that associate a person with organisations. We list the most common roles and how they should be encoded, emphasising the ones that are obligatory in ParlaMint:
affiliation/@role="head"
→ org/@role="country"
affiliation/@role="head"
→ org/@role="republic"
affiliation/@role="head"
& affiliation/@role="member"
→ org/@role="government"
(cf. Section on The government organisation)affiliation/@role="deputyHead"
& affiliation/@role="member"
→ org/@role="government"
affiliation/@role="minister"
& affiliation/@role="member"
→ org/@role="government"
affiliation/@role="head
& affiliation/@role="member"
→ org/@role="ministry"
affiliation/@role="deputyMinister"
& affiliation/@role="member"
→ org/@role="government"
affiliation/@role="deputyHead
& affiliation/@role="member"
→ org/@role="ministry"
affiliation/@role="head"
& affiliation/@role="member"
→ org/@role="parliamentaryGroup"
(cf. Section on Political parties and parliamentary groups)affiliation/@role="member"
→ org/@role="parliamentaryGroup"
affiliation/@role="head"
& affiliation/@role="member"
→ org/@role="politicalParty"
affiliation/@role="member"
→ org/@role="politicalParty"
affiliation/@role="member"
→ org/@role="parliament"
(cf. Section on The parliament organisations)Organisations are also created and dissolved, and this information is encoded in the <event> element, which has as its <label> existence, and where the start of its existence is given in the from attribute; as there is no to attribute, this also means that the party still exists.
Returning to the role attribute, it is the ParlaMint schema (cf. the Section on Validating ParlaMint corpora) that gives its set of allowed values. Currently, the list is quite long, as we left it up to the partners of ParlaMint I to determine the values, however, there are some that are common to all corpora, and, for some, it is obligatory to have organisations with these roles in the organisation list of a corpus. Furthermore, it is recommended that the organisations are listed in the order as given below, with the obligatory roles emphasised:
We discuss the obligatory types of organisations and political parties in the following sections.
#parla.national #parla.uni
: national parliament, unicameral system#parla.national #parla.lower
: national parliament, lower house#parla.national #parla.upper
: national parliament, upper house#parla.regional #parla.uni
: regional parliament, unicameral system#parla.regional #parla.lower
: regional parliament, lower house#parla.regional #parla.upper
: regional parliament, upper houseIn the scope of ParlaMint, very important organisations are political parties (distinguished by the role politicalParty) and, even more so, parliamentary groups that represent political parties in the parliament (distinguished by the role parliamentryGroup). These organisations are linked to <person> elements (i.e. speakers) so that is known to which political party or parliamentary group the speaker belongs to or represents in a certain moment of time, as further explained in the Section on Speaker affiliations.
ParlaMint requires that a corpus must use parliamentary groups, while the use of political parties is optional. Note that if political parties are used, it is also expected to encode which political parties constitute a parliamentary group; this is encoded via the <relation> element, as further explained in the Section on Relations between organisations.
The introduction to this chapter already gave examples of how organisations are encoded in general, so we here only give examples of the encoding of the additional metadata that can also be associated with political parties or parliamentary groups, i.e. their political orientation on the left-to-right scale and the variables of the Chapel Hill Expert Surveys for Europe, CHES for short. This additional metadata is encoded in the <state> element(s), which should be the last element(s) in the <org>.
The second type of metadata on organisations, in particular on political parties and parliamentary groups comes from the Chapel Hill Expert Surveys for Europe (CHES), either from the 1999-2019 edition, of from the 2019 edition. Here the top-level <state> element gives the type of the state, i.e. CHES and the URL of the CSV source for the information, as well as the name of the political party in CHES (which typically differs from its name or ID in ParlaMint) in the key attributes and the year span that the CHES information covers in from and to attributes.
A text body contains a series of divisions, <div> in cases when the source document can be reliably split into sections, which is typically done on the basis of headings identified in the source. When this is not possible, the complete body will be just one division.
In ParlaMint we have two types of divisions, which are distinguished by the value of their (required) type attribute. If its value is debateSection, then the divisions must contain at least one speech, while the value commentSection must not contain any speeches, i.e. it contains transcriber (or other) comments only, e.g. the table of contents, references to laws etc.
The <u> element should also have the ana attribute giving a pointer to the typology of types of speakers, which is especially important to enable the distinction between the speeches of a session chair (who mostly speak on procedural matters) from regular, and, possibly, guest speakers. Note that we used the #regular values not only for MPs but for all other speakers that can regularly speak in a parliament, e.g. ministers, the MP, members of parlimentary commissions etc. There is also a special type of speaker, called #interrupting, which we discuss further in the Section on Interrupted utterances.
The utterances are then segmented using the <seg> element, which encodes the paragraphs of the source transcription. Even if the source files do not contain paragraph markings, each speech should contain at least one segment.
Finally, an utterance (just as a division) can also contain transcriber comments (notes), as further detailed in the next section.
Transcriber comments give information on who spoke, what the time was, interruptions and the reason for them, what is happening in the chamber, results of voting, etc. While section headings can also be taken as a kind of transcriber comments, these serve to structure the transcription and are encoded as <head> elements, as explained at the start of this chapter, cf. the Example there. Another type of transcribe comment treated separately is the presence of gaps in the transcript; these are treated in the Section on Gaps.
Apart heads and gaps, transcriber comments are encoded using the <note> element or one of several so called ‘incident’ elements, as explained below. These elements can be placed directly inside <div>, <u>, <seg> or even <s> in the linguistically annotated version. They should be placed as far up the hierarchy as possible, i.e. if they would appear at the start or end of a segment or utterance, to encode them before the start, or, respectively, after the end of this segment or utterance. If possible, it is especially conventient not to have them inside <seg> (which contains text), as placing these elements there leads to mixed content, which is more difficult to process further, in particular when linguistically annotating the corpus. Similary, it is also better to move them outside <s> elements. However, if a transcriber comments were placed in the middle of the text for good reasons then they can be encoded inside the segment or sentence. Note, however, that utterances can also be split on transcriber comments, as is explained in the Section on Interrupted utterances.
Boris Johnson: I propose a no-deal Brexit. /Jeremy Corbyn: Traitor!/ Because England does not want any dealings with the European Union.⚓The standard manner in which such interruptions are encoded is using the default <note> element, or, much better, the <vocal> element, as explained in the Section on Incidents, as below:
In the example the speaker of the interrupting speech has also been identified and marked in the who attribute; in cases where this is not possible, this attribute can be omitted. As mentioned this speaker should also have the value #interrupting in their ana attribute; this value comes from the appropriate category of the ParlaMint speaker type taxonomy. In case the speaker is identified, and their status can be determined, ana should also contain the type proper of the speaker, i.e. whether they are #chair, #regular or #guest speaker is assumed.
This section introduces the ParlaMint linguistic annotation. An important note is that a linguistically annotated ParlaMint corpus is stored separately from its base (or plain-text) version, i.e. the version that has been discussed in the preceding sections. The encoding of the linguistically annotated version differs from the plain-text one in the following:
.ana.xml
. For example, if the plain-text root has the file name ParlaMint-CZ.xml
, the linguistically annotated one should be ParlaMint-CZ.ana.xml
, or ParlaMint-CZ_2016-04-13.xml
and ParlaMint-CZ_2016-04-13.ana.xml
.ana
, e.g. <teiCorpus xml:id="ParlaMint-CZ.ana">
[ParlaMint]
in the plain-text version, should be [ParlaMint.ana]
for the linguistically annotated version.Linguistic annotation is added only to the text content of <seg> elements inside the speeches, i.e <u> elements. For this text, ParlaMint requires the following additional markup to be present:
Below, we explain the encoding of each of these levels.
The base form or lemmas of a word is given as the value of the lemma attribute, while punctuation characters, <pc>, do not have this attribute.
The UD part-of-speech and morphological features are both packed in the msd attribute, with the part-of-speech having the UPosTag
linguistic attribute, and the features separated by the vertical bar.
ParlaMint also allows (but does not require) part-of-speech tags from some other tagset5 to be added to the linguistic annotation. Where this information is encoded, depends on the type of tagset.
mte:
is a prefix that is, via the TEI extended pointer syntax as defined in the TEI header (cf. the Section on Prefix definitions) expanded so that the value of such an ana attribute points to the expansions of the given tag to a feature structure. For example, the value mte:Vmpr1p would be expanded to https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p, which then resolves to the feature-structure below: Certain frameworks, in particular the UD one (cf. their information on Tokenization and Word Segmentation and on Words, Tokens and Empty Nodes), allow for tokens to be decomposed into several words, and it is these syntactic words, and not tokens, that are further annotated.
join="right"
should be added to the top level word.join="right"
should be added to the last token, i.e. ‘lepši’.ParlaMint also requires annotation of Named Entities (NE), which should be categorised into the following four types:
These types are also specified in a specialised taxonomy, as further explained in the Section on Linguistic taxonomies.
The link elements then give, via the value of their target attribute, references the head and argument tokens of the syntactic relation, which is specified in the ana attribute. By convention, the links are ordered so that the argument references follow the ordering of the tokens in the sentence, i.e. all the tokens in the sentence should appear in order in the second position. Note that for the top level root
relation (of which there should be only one in the sentence), the head is the reference to the sentence ID.
The relations themselves are pointers which use the ud:
prefix that is, via the TEI extended pointer syntax as defined in the TEI header (cf. the Section on Prefix definitions) expanded so that the value of such an ana attribute points to the categories of the special UD syntactic taxonomy which must be a part of the linguistically annotated version of the corpus; how to insert this taxonomy is specified in the Section on Linguistic taxonomies. There is one more detail to watch out for, namely, that UD allows the colon symbol :
to appear in extended relations, e.g. acl:relcl
for relative clause modifier. As we already use the colon for the extended pointer prefix, the colons in the relations should be changed to underscore, e.g. to ud-syn:acl_relcl
. Note, however, that the relations specified in a <link> ana attribute are just pointers, and could have any value; it is the UD taxonomy that actually determines the correct value of the relation.
G1.1/S2mf
is transformed into sem:G1.1 sem:S2
i.e. the mf
(for male and female) qualifiers are removed from S2mf
. Note also that the prefix sem
(cf. the Section on Prefix definitions) is used for pointing into the USAS taxonomy. With this set-up it is possible to encode the exact USAS tags as well as their ParlaMint categories, which give the not only the tag, but also its gloss.What kind of metadata a plain-text ParlaMint corpus should contain was explained in the Section on Corpus metadata and in this section we detail what additions must be made to the metadata for the linguistically annotated version. Note that the changes for this version have been already explained at the start of this Chapter. In short, there are three additional parts that should be added to the <teiHeader> of the corpus root, namely a description of the tool(s) used to linguistically annotate the corpus, two additional taxonomies (one for named entities, and one for UD syntactic relations) and the definition of the prefix expansions for UD syntactic relations. These descriptions should also serve as the point of departure for those that want to introduce their own prefixes and taxonomies for defining additional and corpus-specific part-of-speech tagging schemes or named entity classes.
Some linguistic annotations have fixed vocabularies and these should be encoded as taxonomies in the TEI header of the linguistically analysed corpus root, similarly to other taxonomies, as discussed in the Section on the Class declaration.
:
in the official name of the relation must be substituted by the underscore, _
, to enable correct referencing of these IDs, as discussed in the Section on Syntactic parses.Pointing attributes, such as ana, take as their value a series of references to the value of xml:id elements in an XML document. If this is the same document, then the reference to the ID is the hash character, #
prefixed to the particular ID, e.g. #parla.uni, and if they are in another XML document, then the hash is prefixed with the URL of the document, e.g. https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p.
Because the complete URL tends to be long, which is especially inconvenient when such references are given to every token in a corpus, TEI introduces the so called Extended pointer syntax, whereby the reference to an ID can be given in the form of a prefix, which is separated by a colon from the local part of the ID reference, and the value of this prefix is determined via the <prefixDef> element in the <profileDesc> of the TEI header.
ud-syn
prefix, so for any ID reference with this prefix, e.g. ud-syn:acl_relcl, the part after the prefix (acl_relcl
) should be matched against (.+)
and the result being the matched part (here the entire relation acl_relcl
) substituted by #$1
, i.e. by the hash character followed by the original value, so that ud-syn:acl_relcl gives #acl_relcl. This substitution is of course trivial, and hardly necessary, but was implemented so that all fixed-vocabulary linguistic analyses have the same treatment.More to the point is the second example, where very short ID references, such as mte:Vmpr1p are transformed to https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p, as already explained in the Section on Word-level annotation.
Finally, each prefix definition also contains a possibly bi-lingual paragraph explaining the definition.
The ParlaMint machine translated corpora are encoded simliarly to corpora in their source language, i.e. they have an identically structured corpus root and components. The most obvious differences are the following:
ParlaMint-LV-en.ana.xml
and a component filename could be ParlaMint-LV_2014-11-04.ana.xml
.ParlaMint-LV-en.ana
or ParlaMint-LV_2014-11-04.ana
.[ParlaMint-en.ana]
.xml:id="en"
.ParlaMint-LV-en_2014-11-04.ana.xml
should be as in the following example: ./ParlaMint-LV.TEI.ana/
and ./ParlaMint-LV-en.TEI.ana/
), so that corresp values of the file ParlaMint-LV-en.TEI.ana/2014/ParlaMint-LV-en_2014-11-04.ana.xml
with the mt-src
prefix will point to ParlaMint-LV-en.TEI.ana/2014/../../ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014.ana.xml
i.e. to ./ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014-11-04.xml
.The chapter explains how to validate and finalise a ParlaMint corpus, and introduces scripts for converting a ParlaMint corpus to other, derived formats.
The XML structure of ParlaMint corpora can be validated via RelaxNG schemas, which exist in two versions, one that was produced as a customisation of the TEI Guidelines, and a set of schemas that were made from scratch for ParlaMint.
The TEI customisation is written as a TEI ODD document, which is, in fact, the XML version of this document, and is available in the TEI/ directory of the ParlaMint GitHub repository. The XML contains not only the prose guidelines, but also the formal specification of the TEI schema, which is given in the Appendix A. In the XML it contains the formal schema specification, while in the on-line version this is converted to a reference to all the elements, attributes and classes used in ParlaMint corpora. The ODD document is not immediately useful for XML validation, but has to be converted with TEI XSLT stylesheets first in order to obtain a RelaxNG schema, and this schema is also available in the same directory under the name of ParlaMint.rng (in RelaxNG XML syntax) and ParlaMint.rnc (in RelaxNG compact syntax). This schema should be used to check that ParlaMint component files validate against TEI.
However, it is difficult to constrain a TEI ODD-derived XML schema to allow only the kinds of nestings and attributes that should appear in a ParlaMint corpus, so this schema allows (and lists Appendix A) nesting of elements, as well as attributes that are in fact forbidden in ParlaMint corpora.
For this reason, we have also developed a set of RelaxNG schemas from scratch, which do allow only those elements, attributes and content models that are in fact valid for a ParlaMint corpus. There are all together four such schemas, one for a "plain-text" corpus root, one for its corpus components, one for the linguistically annotated corpus root, and one for its components. These schemas can be found in the Schema/ directory of the ParlaMint GitHub repository, with the README file giving instructions on how to use them.
Validating with XML schemas checks the formal structure of XML files but is less successful in validating other aspects of conformance, such as the textual content or linking of pointer attributes. For this reason, we have also developed an XSLT script that assumes a schema-validated ParlaMint file on its input, and checks various other aspects of conformance. These validation scripts can be found in the Scripts/ directory of the ParlaMint GitHub repository, with the README file listing them.
It should be noted that it is not necessary to run the validation scripts directly, as the validation can be performed by the main Makefile of the project. The Makefile is self-documenting, i.e. to see how to use it, please run make help
in the top level directory of the ParlaMint project.
While each contributor of a corpus should validate their files with the ParlaMint schemas and validation script, there also exist further stages of validation, which are also applied to ParlaMint corpora:
While the vast majority of converting source encodings into the ParlaMint corpus format is left to the compilers of a corpus, there are a few metadata elements that can be produced by a common script on the basis of nearly finished corpora, which then results in the final version of the corpus for a particular release. This includes setting the date, edition and handle under which the corpus will be distributed, and also calculating the size of the corpus (cf. the Sections on Extents and on Tags declaration). The script for finalisation can be found in the Scripts/ directory of the ParlaMint GitHub repository and the README file briefly explains its function; more comments can be found in the script itself.
A TEI encoded document is, in general, not meant to be used directly by software programs, rather, it serves as an interchange and storage format. The ParlaMint project has produced various scripts to down-convert the XML encoded corpora to other formats and they can be found in the Scripts/ directory of the ParlaMint GitHub repository, with the README file listing them and explaining their function. In short, the scripts convert the ParlaMint XML to plain text, to CoNLL-U, and to vertical format. There is also a script that takes a ParlaMint corpus and makes from it a sample for inclusion to the ParlaMint GitHub repository.
The ParlaMint GitHub repository contains these guidelines, the ParlaMint XML schemas, the scripts used to validate, finalise and convert the ParlaMint TEI XML corpora to derived formats, and samples of the ParlaMint corpora. There are four main branches in the repository:
The validation procedure for corpora is explained in the Section on Validating ParlaMint corpora, while the technical aspects of contributing corpora is further explained in the CONTRIBUTING file of the repository.
The work on these recommendations was funded by the CLARIN Research Infrastructure for Language Resources and Tools.
<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure 15.1. Varieties of Composite Text] | |||||||||||||||||||
Module | textstructure — Formal specification | ||||||||||||||||||
Attributes | att.global.linking (synch, next, prev, @corresp)
| ||||||||||||||||||
Contained by | core: teiCorpus | ||||||||||||||||||
May contain | |||||||||||||||||||
Note | This element is required. It is customary to specify the TEI namespace http://www.tei-c.org/ns/1.0 on it, for example: <TEI version="4.4.0" xml:lang="it" xmlns="http://www.tei-c.org/ns/1.0">. | ||||||||||||||||||
Example | Example of ParlaMint corpus component: <TEI xml:id="ParlaMint-GB_2015-01-06-commons"
xml:lang="en" ana="#parla.sitting #reference" xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>...</teiHeader>
<text ana="#reference">
<body>...</body>
</text>
</TEI> | ||||||||||||||||||
Schematron |
<sch:ns prefix="tei"
uri="http://www.tei-c.org/ns/1.0"/>
<sch:ns prefix="xs"
uri="http://www.w3.org/2001/XMLSchema"/> | ||||||||||||||||||
Schematron |
<sch:ns prefix="rng"
uri="http://relaxng.org/ns/structure/1.0"/> | ||||||||||||||||||
Content model | <content> <elementRef key="teiHeader"/> <elementRef key="text"/> </content> ⚓ | ||||||||||||||||||
Schema Declaration | element TEI { tei_att.global.linking.attribute.corresp, attribute xml:id { text }, attribute xml:lang { text }, attribute ana { list { + } }, tei_teiHeader, tei_text }⚓ |
<addName> (additional name) contains an additional name component, such as a nickname, epithet, or alias, or any other descriptive phrase used within a personal name. [13.2.1. Personal Names] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Member of | |
Contained by | namesdates: persName |
May contain | Character data only |
Example | <persName>
<surname>Möderndorfer</surname>
<forename>Jani</forename>
<addName>Janko</addName>
</persName> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element addName { tei_att.global.attribute.xmllang, text }⚓ |
<affiliation> (affiliation) optionally contains the role name corresponding to the affiliation, the name of the organisation that the person is affiliated with, and notes giving further informal information about the affiliation. [15.2.2. The Participant Description] | |||||||
Module | namesdates — Formal specification | ||||||
Attributes | att.global.analytic (@ana) att.global.source (@source) att.datable.w3c (notBefore, notAfter, @when, @from, @to) att.canonical (key, @ref)
| ||||||
Member of | |||||||
Contained by | namesdates: person | ||||||
May contain | |||||||
Note | If included, the name of an organization may be tagged using either the <name> element as above, or the more specific <orgName> element. | ||||||
Example | <person xml:id="AdamKalous.1979">
<persName>
<surname>Kalous</surname>
<forename>Adam</forename>
</persName>
<sex value="M"/>
<birth when="1979-10-06"/>
<idno type="URI">https://www.psp.cz/sqw/detail.sqw?id=6497</idno>
<affiliation ref="#subcommittee.PEFPS.1414"
role="head" from="2018-03-14T00:00:00"
to="2021-10-21T00:00:00">
<roleName xml:lang="en">Chair Person</roleName>
</affiliation>
<affiliation ref="#subcommittee.PEFPS.1414"
role="member" from="2018-03-14T00:00:00"
to="2021-10-21T00:00:00">
<roleName xml:lang="en">Member</roleName>
</affiliation>
<affiliation ref="#committee.VSR.1315"
role="deputyHead" from="2017-12-06T16:00:00"
to="2021-10-21T00:00:00">
<roleName xml:lang="en">Vice Chairman</roleName>
</affiliation>
<affiliation ref="#committee.VSR.1315"
role="member" from="2017-11-28T16:00:00"
to="2021-10-21T00:00:00">
<roleName xml:lang="en">Member</roleName>
</affiliation>
<affiliation ref="#parliamentaryGroup.ANO.1292"
role="member" from="2017-10-24T00:00:00"
to="2021-10-21T00:00:00">
<roleName xml:lang="en">Member</roleName>
</affiliation>
<affiliation ref="#politicalParty.ANO2011.1104"
role="representative" from="2017-10-21" to="2021-10-21">
<roleName xml:lang="en">Candidate MP</roleName>
</affiliation>
<affiliation ref="#parliament"
ana="#parliament.PSP8" role="member" from="2017-10-21T14:00:00"
to="2021-10-21T00:00:00">
<roleName xml:lang="en">MP</roleName>
</affiliation>
</person> | ||||||
Example | <p>The affiliation element can also include an <att>ana</att> attribute, which points to the appropriate legislative period when the person was affiliated with the specified organisation:</p>
<person xml:id="BahŽibertAnja">
<persName>
<surname>Bah</surname>
<surname>Žibert</surname>
<forename>Anja</forename>
</persName>
<sex value="F"/>
<affiliation role="member" ref="#DZ"
from="2014-08-01" to="2018-06-21" ana="#DZ.7">
<roleName xml:lang="en">MP</roleName>
</affiliation>
<affiliation role="member"
ref="#party.SDS.2" from="2014-08-01" to="2018-06-21"
ana="#DZ.7">
<roleName xml:lang="en">Member</roleName>
</affiliation>
<affiliation role="member" ref="#DZ"
from="2018-06-22" ana="#DZ.8">
<roleName xml:lang="en">MP</roleName>
</affiliation>
</person> | ||||||
Content model | <content> <elementRef key="roleName" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="orgName" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="note" minOccurs="0" maxOccurs="unbounded"/> </content> ⚓ | ||||||
Schema Declaration | element affiliation { tei_att.global.analytic.attribute.ana, tei_att.global.source.attribute.source, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.canonical.attribute.ref, attribute role { "head" | "minister" | "member" | "academician" | "alternateOfDelegation" | "associateMember" | "candidateChairman" | "constitutionalJudge" | "deputyHead" | "deputyMinister" | "ministerDelegate" | "nonAttachedMember" | "observer" | "ombudsman" | "prosecutorGeneral" | "publicDefenderOfRights" | "replacement" | "representative" | "secretary" | "secretaryGeneral" | "secretaryOfState" | "verifier" | "vicePublicDefenderOfRights" }, tei_roleName*, tei_orgName*, tei_note* }⚓ |
<appInfo> (application information) records information about an application which has edited the TEI file. [2.3.11. The Application Information Element] | |
Module | header — Formal specification |
Contained by | header: encodingDesc |
May contain | header: application |
Example | <appInfo>
<application version="4.0"
ident="stanford-corenlp">
<label>Stanford CoreNLP</label>
<desc>Tokenisation, POS tagging, NER and dependency parsed using Stanford CoreNLP <ref target="https://stanfordnlp.github.io/CoreNLP/">https://stanfordnlp.github.io/CoreNLP/</ref>.</desc>
</application>
</appInfo> |
Example | <appInfo>
<application version="1.0"
ident="reldi-tokeniser">
<label>ReLDI tokeniser</label>
</application>
<application version="1.0"
ident="classla-stanfordnlp">
<label>CLASSLA-StanfordNLP</label>
</application>
<application version="1.0"
ident="janes-ner">
<label>NER system for South Slavic languages</label>
</application>
</appInfo> |
Content model | <content> <elementRef key="application" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element appInfo { tei_application+ }⚓ |
<application> provides information about an application which has acted upon the document. [2.3.11. The Application Information Element] | |||||||||||||
Module | header — Formal specification | ||||||||||||
Attributes |
| ||||||||||||
Contained by | header: appInfo | ||||||||||||
May contain | |||||||||||||
Example | <appInfo>
<application version="1"
ident="app-stanza">
<label>Stanza</label>
<desc xml:lang="en">
<ref target="https://stanfordnlp.github.io/stanza/index.html">Stanza</ref>: a jointly trained neural tagger, lemmatizer and dependency parser. Pretrained model based on the italian-isdt-ud-2.5 treebank</desc>
</application>
<application version="1" ident="app-t2k">
<label>T2K</label>
<desc xml:lang="en">
<ref target="http://www.italianlp.it/demo/t2k-text-to-knowledge/">T2K</ref>: contains a named entity recognition module for Italian.</desc>
</application>
<application version="1"
ident="conll-U2TEIXML">
<label>CoNLL-U 2 TEI XML</label>
<desc xml:lang="en">
<ref target="http://conllu2teixml">CoNLL-U 2 TEI XML</ref>: converter from CoNLL-U format to (ParlaClarin/ParlaMint) Tei XML Format</desc>
</application>
</appInfo> | ||||||||||||
Example | <appInfo>
<application version="4.0"
ident="stanford-corenlp">
<label>Stanford CoreNLP</label>
<desc>Tokenisation, POS tagging, NER and dependency parsed using Stanford CoreNLP <ref target="https://stanfordnlp.github.io/CoreNLP/">https://stanfordnlp.github.io/CoreNLP/</ref>.</desc>
</application>
</appInfo> | ||||||||||||
Content model | <content> <elementRef key="label"/> <elementRef key="desc" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||||||||
Schema Declaration | element application { attribute ident { text }, attribute version { text }, tei_label, tei_desc+ }⚓ |
<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.] | |||||||
Module | header — Formal specification | ||||||
Attributes |
| ||||||
Contained by | header: publicationStmt | ||||||
May contain | |||||||
Note | A consistent format should be adopted | ||||||
Example | <availability status="free">
<licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p xml:lang="hr">Ovaj rad je dostupan pod <ref target="http://creativecommons.org/licenses/by/4.0/">međunarodnom licencom Creative Commons Imenovanje 4.0</ref>
</p>
<p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>
</p>
</availability> | ||||||
Content model | <content> <elementRef key="licence"/> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||
Schema Declaration | element availability { attribute status { "free" }, tei_licence, tei_p+ }⚓ |
<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements] | |
Module | core — Formal specification |
Member of | |
Contained by | header: sourceDesc |
May contain | |
Note | Contains phrase-level elements, together with any combination of elements from the model.biblPart class |
Example | <bibl>
<title type="main">Minutes of the National Assembly of the Republic of Bulgaria</title>
<date when="2020-03-11">2020-03-11</date>
</bibl> |
Example | <bibl>
<title type="main" xml:lang="en">https://www.tbmm.gov.tr/tutanak/donem24/yil2/bas/b013m.htm</title>
<edition xml:lang="en">Official session record</edition>
<publisher xml:lang="en">The Turkish Parliament</publisher>
<idno type="URI">https://www.tbmm.gov.tr/</idno>
<date when="2011-10-27">2011-10-27</date>
</bibl> |
Content model | <content> <elementRef key="title" minOccurs="1" maxOccurs="unbounded"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="edition" minOccurs="0" maxOccurs="1"/> <elementRef key="publisher" minOccurs="0" maxOccurs="1"/> <elementRef key="idno" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="date" minOccurs="1" maxOccurs="1"/> </alternate> </content> ⚓ |
Schema Declaration | element bibl { tei_title+, ( tei_edition? | tei_publisher? | tei_idno* | tei_date )+ }⚓ |
<birth> (birth) contains information about a person's birth, obligatorily its date and optionaly the place. Note that there can be several placeNames, all referring to the same place, but written in different languages or scripts. [15.2.2. The Participant Description] | |||||||||
Module | namesdates — Formal specification | ||||||||
Attributes |
| ||||||||
Contained by | namesdates: person | ||||||||
May contain | namesdates: placeName | ||||||||
Example | <person xml:id="ReinerŽeljko" n="1291"> ...
<birth when="1953-05-28"/>
</person> | ||||||||
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="placeName" minOccurs="0" maxOccurs="unbounded"/> </alternate> </content> ⚓ | ||||||||
Schema Declaration | element birth { attribute when { text }, ( tei_placeName* ) }⚓ |
<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure] | |
Module | textstructure — Formal specification |
Contained by | textstructure: text |
May contain | textstructure: div |
Example | <body>
<div type="debateSection">...</div>
<div type="debateSection">...</div>
...
</body> |
Content model | <content> <elementRef key="div" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element body { tei_div+ }⚓ |
<catDesc> (category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal <textDesc>. [2.3.7. The Classification Declaration] | |
Module | header — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Contained by | header: category |
May contain | |
Example | <category xml:id="parla.organisation">
<catDesc xml:lang="en">
<term>Organisation</term>
</catDesc>
<catDesc xml:lang="bg">
<term>Организация</term>
</catDesc>
</category> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="term"/> <alternate minOccurs="1" maxOccurs="unbounded"> <textNode/> <elementRef key="ref"/> </alternate> </sequence> </content> ⚓ |
Schema Declaration | element catDesc { tei_att.global.attribute.xmllang, ( tei_term, ( text | tei_ref )+ ) }⚓ |
<catRef> (category reference) specifies one or more defined categories within some taxonomy or text typology. [2.4.3. The Text Classification] | |||||||||||||||
Module | header — Formal specification | ||||||||||||||
Attributes |
| ||||||||||||||
Contained by | header: textClass | ||||||||||||||
May contain | Empty element | ||||||||||||||
Note | The scheme attribute needs to be supplied only if more than one taxonomy has been declared. | ||||||||||||||
Example | <textClass>
<catRef scheme="#parla.legislature"
target="#parla.uni"/>
</textClass>
... elsewhere ...
<taxonomy xml:id="parla.legislature"> ...
<category xml:id="parla.uni">
<catDesc xml:lang="lt">
<term>Vienų rūmų parlamentas</term>
</catDesc>
<catDesc xml:lang="en">
<term>Unicameralism</term>
</catDesc>
</category>
</taxonomy> | ||||||||||||||
Content model | <content> <empty/> </content> ⚓ | ||||||||||||||
Schema Declaration | element catRef { attribute target { list { + } }, attribute scheme { text }, empty }⚓ |
<category> (category) contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy. [2.3.7. The Classification Declaration] | |||||||||||||||||
Module | header — Formal specification | ||||||||||||||||
Attributes | att.global (xml:id, xml:lang, xml:base, xml:space, @n)
| ||||||||||||||||
Contained by | |||||||||||||||||
May contain | |||||||||||||||||
Example | <category xml:id="parla.session">
<catDesc xml:lang="en">
<term>Session</term>: A parliamentary year, which always begins on the first Tuesday in October at 12.00 o’clock noon and ends on the same date at the same time the following year. However, parliamentary work at Christiansborg is organised in such a way that it primarily takes place from October to June.</catDesc>
</category> | ||||||||||||||||
Example | <category xml:id="parla.term">
<catDesc xml:lang="nl">
<term>Zittingsperiode</term>
</catDesc>
<catDesc xml:lang="en">
<term>Legislative period</term>
</catDesc>
</category> | ||||||||||||||||
Content model | <content> <elementRef key="catDesc" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="category" minOccurs="0" maxOccurs="unbounded"/> </content> ⚓ | ||||||||||||||||
Schema Declaration | element category { tei_att.global.attribute.n, attribute xml:id { text }, attribute ana { list { + } }?, tei_catDesc+, tei_category* }⚓ |
<change> (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file. [2.6. The Revision Description 2.4.1. Creation 11.7. Identifying Changes and Revisions] | |
Module | header — Formal specification |
Attributes | att.datable.w3c (notBefore, notAfter, from, to, @when) |
Contained by | header: revisionDesc |
May contain | core: name character data |
Note | The who attribute may be used to point to any other element, but will typically specify a <respStmt> or <person> element elsewhere in the header, identifying the person responsible for the change and their role in making it. It is recommended that changes be recorded with the most recent first. The status attribute may be used to indicate the status of a document following the change documented. |
Example | <revisionDesc>
<change when="2021-01-28">
<name>Tommaso Agnoloni</name>: Generated corpus in ParlaMint.</change>
<change when="2021-02-26">
<name>Tommaso Agnoloni</name>, <name>Francesca Frontini</name>: Corpus revision, fixing</change>
</revisionDesc> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="name"/> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element change { tei_att.datable.w3c.attribute.when, ( tei_name | text )+ }⚓ |
<classDecl> (classification declarations) contains taxonomies defining classificatory codes used elsewhere in the text. Note that the taxonomies are in ParlaMint typically stored in separate files. [2.3.7. The Classification Declaration 2.3. The Encoding Description] | |
Module | header — Formal specification |
Contained by | header: encodingDesc |
May contain | |
Example | <classDecl> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="href="ParlaMint-SI-taxonomy-parla.legislature.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="href="ParlaMint-SI-taxonomy.xml-speaker_types"/>
...
</classDecl> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="taxonomy"/> <elementRef key="include"/> </alternate> </content> ⚓ |
Schema Declaration | element classDecl { ( tei_taxonomy | tei_include )+ }⚓ |
<correction> (correction principles) states how and under what circumstances corrections have been made in the text. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: editorialDecl |
May contain | core: p |
Note | May be used to note the results of proof reading the text against its original, indicating (for example) whether discrepancies have been silently rectified, or recorded using the editorial tags described in section 3.5. Simple Editorial Changes. |
Example | <editorialDecl>
<correction>
<p>No correction of source texts was performed.</p>
</correction>
</editorialDecl> |
Content model | <content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element correction { tei_p+ }⚓ |
<date> (date) contains a date in any format. [3.6.4. Dates and Times 2.2.4. Publication, Distribution, Licensing, etc. 2.6. The Revision Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 15.2.3. The Setting Description 13.4. Dates] | |
Module | core — Formal specification |
Attributes | att.typed (@type, @subtype) att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.datable.w3c (notBefore, notAfter, @when, @from, @to) |
Member of | |
Contained by | |
May contain | |
Example | The element <date> gives the date in the when attribute in the ISO 8601 format, while the textual content is not constrained: <date when="2021-06-08">2021-06-08</date> |
Example | The textual content can be given according to the conventions used in the local language: <date when="2018-04-13" xml:lang="sl">13.4.2018</date> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="date"/> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element date { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.typed.attributes, ( tei_w | tei_pc | tei_date | text )+ }⚓ |
<death> (death) contains information about a person's death, obligatorily its date and optionaly the place. Note that there can be several placeNames, all referring to the same place, but written in different languages or scripts. [15.2.2. The Participant Description] | |||||||||
Module | namesdates — Formal specification | ||||||||
Attributes |
| ||||||||
Contained by | namesdates: person | ||||||||
May contain | namesdates: placeName | ||||||||
Example | <death when="2020-12-29"/> | ||||||||
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="placeName" minOccurs="0" maxOccurs="unbounded"/> </alternate> </content> ⚓ | ||||||||
Schema Declaration | element death { attribute when { text }, ( tei_placeName* ) }⚓ |
<desc> (description) contains a short description of the purpose, function, or use of its parent element, or when the parent is a documentation element, describes or defines the object being documented. [22.4.1. Description of Components] | |
Module | core — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Member of | |
Contained by | |
May contain | |
Note | When used in a specification element such as <elementSpec>, TEI convention requires that this be expressed as a finite clause, begining with an active verb. |
Example | <p>Example of <gi>desc</gi> elements for transcriber comments:</p>
<gap reason="inaudible">
<desc>speaker spoke too quietly, not understood</desc>
</gap>
<kinesic type="applause">
<desc xml:lang="sl">ploskanje</desc>
</kinesic>
<vocal type="interruption">
<desc>sounds from the chamber</desc>
</vocal>
...
<kinesic type="signal">
<desc>signal for end of debate</desc>
</kinesic>
...
<incident type="action">
<desc>minute of silence</desc>
</incident> |
Example | Example of <desc> elements used as a part of taxonomy: <taxonomy xml:id="parla.legislature">
<desc xml:lang="sl">
<term>Zakonodajna oblast</term>
</desc>
<desc>
<term>Legislature</term>
</desc>
...
</taxonomy> |
Example | Element <desc> can also be used to describe tool(s) used to linguistically annotate the corpus: <application version="1.0"
ident="reldi-tokeniser">
<label>ReLDI tokeniser</label>
<desc xml:lang="en">Tokenisation and sentence segmentation with ReLDI tokeniser, available from <ref target="https://github.com/clarinsi/reldi-tokeniser">https://github.com/clarinsi/reldi-tokeniser</ref>.</desc>
</application> |
Schematron | A <desc> with a type of deprecationInfo should only occur when its parent element is being deprecated. Furthermore, it should always occur in an element that is being deprecated when <desc> is a valid child of that element.
<sch:rule context="tei:desc[ @type eq 'deprecationInfo']">
<sch:assert test="../@validUntil">Information about a
deprecation should only be present in a specification element
that is being deprecated: that is, only an element that has a
@validUntil attribute should have a child <desc
type="deprecationInfo">.</sch:assert>
</sch:rule> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef minOccurs="0" key="term"/> <alternate minOccurs="1" maxOccurs="unbounded"> <textNode/> <elementRef key="ref"/> </alternate> </sequence> </content> ⚓ |
Schema Declaration | element desc { tei_att.global.attribute.xmllang, ( tei_term?, ( text | tei_ref )+ ) }⚓ |
<div> (text division) contains division of the body a corpus component. [4.1. Divisions of the Body] | |||||||
Module | textstructure — Formal specification | ||||||
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.typed (type, @subtype)
| ||||||
Contained by | textstructure: body | ||||||
May contain | |||||||
Example | <div type="debateSection">
<head>Devolution of Power (Cities)</head>
<u xml:id="ParlaMint-GB_2015-01-06-commons.u1">...</u>
<u xml:id="ParlaMint-GB_2015-01-06-commons.u2">...</u>
...
<note>House adjourned.</note>
</div> | ||||||
Schematron |
<sch:report test="(ancestor::tei:l or ancestor::tei:lg) and not(ancestor::tei:floatingText)"> Abstract model violation: Lines may not contain higher-level structural elements such as div, unless div is a descendant of floatingText.
</sch:report> | ||||||
Schematron |
<sch:report test="(ancestor::tei:p or ancestor::tei:ab) and not(ancestor::tei:floatingText)"> Abstract model violation: p and ab may not contain higher-level structural elements such as div, unless div is a descendant of floatingText.
</sch:report> | ||||||
Content model | <content> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> <elementRef key="u"/> </alternate> </content> ⚓ | ||||||
Schema Declaration | element div { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.typed.attribute.subtype, attribute type { "debateSection" | "commentSection" }, tei_head*, ( tei_note | tei_vocal | tei_kinesic | tei_incident | tei_gap | tei_pb | tei_u )+ }⚓ |
<edition> (edition) describes the particularities of one edition of a text. [2.2.2. The Edition Statement] | |
Module | header — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Contained by | core: bibl header: editionStmt |
May contain | Character data only |
Example | <edition>2.1</edition> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element edition { tei_att.global.attribute.xmllang, text }⚓ |
<editionStmt> (edition statement) groups information relating to one edition of a text. [2.2.2. The Edition Statement 2.2. The File Description] | |
Module | header — Formal specification |
Contained by | header: fileDesc |
May contain | header: edition |
Example | <editionStmt>
<edition>2.1</edition>
</editionStmt> |
Content model | <content> <elementRef key="edition" minOccurs="1" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element editionStmt { tei_edition }⚓ |
<editorialDecl> (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text. [2.3.3. The Editorial Practices Declaration 2.3. The Encoding Description 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: encodingDesc |
May contain | |
Example | <editorialDecl>
<correction>
<p>No correction of source texts was performed.</p>
</correction>
<normalization>
<p>Text has not been normalised, except for spacing.</p>
</normalization>
<hyphenation>
<p>Hyphenation has not been altered with respect to the source files.</p>
</hyphenation>
<quotation>
<p>Quotation marks have been left in the text and are not explicitly marked up.</p>
</quotation>
<segmentation>
<p>The texts are segmented into utterances (contributions) and segments (corresponding to paragraphs in the source transcription).</p>
</segmentation>
</editorialDecl> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="correction"/> <elementRef key="normalization"/> <elementRef key="hyphenation"/> <elementRef key="quotation"/> <elementRef key="segmentation"/> </alternate> </content> ⚓ |
Schema Declaration | element editorialDecl { ( tei_correction | tei_normalization | tei_hyphenation | tei_quotation | tei_segmentation )+ }⚓ |
<education> (education) contains a description of the educational experience of a person. [15.2.2. The Participant Description] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:id, xml:base, xml:space, @n, @xml:lang) att.datable.w3c (notBefore, notAfter, @when, @from, @to) |
Contained by | namesdates: person |
May contain | Character data only |
Example | <education>Bachelor of Science, Electrical and Information Technology Engineer</education> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element education { tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, text }⚓ |
<email> (electronic mail address) contains an email address identifying a location to which email messages can be delivered. [3.6.2. Addresses] | |
Module | core — Formal specification |
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) |
Member of | |
Contained by | core: unit |
May contain | |
Note | The format of a modern Internet email address is defined in RFC 2822 |
Example | The element can be used for fine-grained Named Entities which include e-mail addresses: <email ana="ne:me"
xml:id="ParlaMint-CZ_2014-12-09-ps2013-023-05-003-133.ne87">
<w xml:id="ParlaMint-CZ_2014-12-09-ps2013-023-05-003-133.u4.p9.s3.w13"
lemma="namraza@cd.cz"
msd="UPosTag=NOUN|Case=Gen|Gender=Fem|Number=Plur|Polarity=Pos">namraza@cd.cz</w>
</email> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element email { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, ( tei_w | tei_pc | text )+ }⚓ |
<encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. [2.3. The Encoding Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Formal specification |
Contained by | header: teiHeader |
May contain | |
Example | General structure of an encoding description: <encodingDesc>
<projectDesc>...</projectDesc>
<editorialDecl>...</editorialDecl>
<tagsDecl>...</tagsDecl>
<classDecl>...</classDecl>
</encodingDesc> |
Example | Structure of an encoding description for unannotated corpus root: <encodingDesc>
<projectDesc>
<p xml:lang="sl">
<ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>
</p>
<p xml:lang="en">
<ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a project that aims to (1) create a multilingual set of comparable corpora of parliamentary proceedings uniformly encoded...</p>
</projectDesc>
<editorialDecl>
<correction>...</correction>
<normalization>...</normalization>
<hyphenation>...</hyphenation>
<quotation>...</quotation>
<segmentation>...</segmentation>
</editorialDecl>
<tagsDecl>
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="body" occurs="414"/>
<tagUsage gi="desc" occurs="10234"/>
<tagUsage gi="div" occurs="414"/>
</namespace>
</tagsDecl>
<classDecl>...</classDecl>
</encodingDesc> |
Example | Example of encoding description of an annotated corpus root. The structure includes two additional elements, <listPrefixDef> and <appInfo>. <encodingDesc>
<projectDesc>... </projectDesc>
<editorialDecl>...</editorialDecl>
<tagsDecl>...</tagsDecl>
<classDecl>...</classDecl>
<listPrefixDef>
<prefixDef ident="mte"
matchPattern="(.+)"
replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-sl.xml#$1">
<p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Slovenian MULTEXT-East Version 6 MSDs.</p>
</prefixDef>
</listPrefixDef>
<appInfo>
<application>...</application>
</appInfo>
</encodingDesc> |
Example | Example of encoding description of a corpus component (annotated or unannotated). In contrast to the corpus root, the encoding description of a corpus component contains only two elements, namely, the <projectDesc> and the <tagsDecl>. <encodingDesc>
<projectDesc>...</projectDesc>
<tagsDecl>...</tagsDecl>
</encodingDesc> |
Content model | <content> <elementRef key="projectDesc"/> <elementRef key="editorialDecl" minOccurs="0" maxOccurs="1"/> <elementRef key="tagsDecl"/> <elementRef key="classDecl" minOccurs="0" maxOccurs="1"/> <elementRef key="listPrefixDef" minOccurs="0" maxOccurs="1"/> <elementRef key="appInfo" minOccurs="0" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element encodingDesc { tei_projectDesc, tei_editorialDecl?, tei_tagsDecl, tei_classDecl?, tei_listPrefixDef?, tei_appInfo? }⚓ |
<equipment> (equipment) provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text. [8.2. Documenting the Source of Transcribed Speech 15.3.2. Declarable Elements] | |
Module | spoken — Formal specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @next, @prev)) (att.global.analytic (@ana)) (att.global.responsibility (@resp)) (att.global.source (@source)) att.declarable (@default) |
Contained by | — |
May contain | core: p |
Example | <equipment>
<p>"Hi-8" 8 mm NTSC camcorder with integral directional
microphone and windshield and stereo digital sound
recording channel.
</p>
</equipment> |
Example | <equipment>
<p>8-track analogue transfer mixed down to 19 cm/sec audio
tape for cassette mastering</p>
</equipment> |
Content model | <content> <classRef key="model.pLike" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element equipment { tei_att.global.attributes, tei_att.declarable.attributes, tei_model.pLike+ }⚓ |
<equipment> (equipment) provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text. | |
Module | spoken — Formal specification |
Attributes | att.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @next, @prev)) (att.global.analytic (@ana)) (att.global.responsibility (@resp)) (att.global.source (@source)) att.declarable (@default) |
Contained by | — |
May contain | core: p |
Example | <equipment>
<p>"Hi-8" 8 mm NTSC camcorder with integral directional
microphone and windshield and stereo digital sound
recording channel.
</p>
</equipment> |
Content model | <content> <classRef key="model.pLike" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element equipment { tei_att.global.attributes, tei_att.declarable.attributes, tei_model.pLike+ }⚓ |
<event> (event) contains data relating to any kind of significant event associated with a person, place, or organisation. [13.3.1. Basic Principles] | |
Module | namesdates — Formal specification |
Attributes | att.global (n, xml:lang, xml:base, xml:space, @xml:id) att.datable.w3c (notBefore, notAfter, @when, @from, @to) |
Contained by | |
May contain | core: label |
Example | <event xml:id="PoGB.55" from="2010-05-18"
to="2015-03-30">
<label>Fifty-fifth Parliament of the United Kingdom</label>
</event> |
Example | <org xml:id="government.HR"
role="government">
<orgName xml:lang="hr" full="yes">Vlada Republike Hrvatske</orgName>
<orgName xml:lang="en" full="yes">Government of the Republic of Croatia</orgName>
<event from="1990-05-30">
<label xml:lang="en">existence</label>
</event>
</org> |
Content model | <content> <elementRef key="label" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element event { tei_att.global.attribute.xmlid, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_label+ }⚓ |
<extent> (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units. [2.2.3. Type and Extent of File 2.2. The File Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 10.7.1. Object Description] | |
Module | header — Formal specification |
Contained by | header: fileDesc |
May contain | core: measure |
Example | <extent>
<measure unit="speeches" quantity="75122"
xml:lang="sl">75.122 govorov</measure>
<measure unit="speeches" quantity="75122"
xml:lang="en">75,122 speeches</measure>
<measure unit="words" quantity="20190034"
xml:lang="sl">20.190.034 besed</measure>
<measure unit="words" quantity="20190034"
xml:lang="en">20,190,034 words</measure>
</extent> |
Content model | <content> <elementRef key="measure" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element extent { tei_measure+ }⚓ |
<figure> (figure) groups elements representing or containing graphic information such as an illustration, formula, or figure. [14.4. Specific Elements for Graphic Images] | |
Module | figures — Formal specification |
Member of | |
Contained by | namesdates: person |
May contain | |
Example | <figure>
<graphic url="https://www.psp.cz/eknih/cdrom/2017ps/eknih/2017ps/poslanci/i6497.jpg"/>
</figure> |
Content model | <content> <elementRef key="head" minOccurs="0" maxOccurs="1"/> <elementRef key="graphic" minOccurs="1" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element figure { tei_head?, tei_graphic }⚓ |
<fileDesc> (file description) contains a full bibliographic description of an electronic file. [2.2. The File Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Formal specification |
Contained by | header: teiHeader |
May contain | |
Note | The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived. |
Example | Basic structure of the <fileDesc> element: <fileDesc>
<titleStmt>...</titleStmt>
<editionStmt>...</editionStmt>
<extent>...</extent>
<publicationStmt>...</publicationStmt>
<sourceDesc>...</sourceDesc>
</fileDesc> |
Example | Example of the <fileDesc> element in a corpus root: <fileDesc>
<titleStmt>
<title type="main" xml:lang="en">Dutch parliamentary corpus ParlaMint-NL [ParlaMint]</title>
<title type="main" xml:lang="nl">Corpus van het Nederlandse Parlement ParlaMint-NL [ParlaMint]</title>
<title type="sub" xml:lang="en">Minutes of the Eerste Kamer and Tweede Kamer of The Netherlands (2015-2020)</title>
<title type="sub" xml:lang="nl">Minuten van de Eerste en Tweede Kamer van Nederland (2015-2020)</title>
<meeting n="28-lower"
ana="#parla.lower #parla.term">28ste Tweede Kamer</meeting>
<meeting n="29-lower"
ana="#parla.lower #parla.term">29ste Tweede Kamer</meeting>
<meeting n="34-upper"
ana="#parla.upper #parla.term">34ste Eerste Kamer</meeting>
<meeting n="35-upper"
ana="#parla.upper #parla.term">35ste Eerste Kamer</meeting>
<meeting n="36-upper"
ana="#parla.upper #parla.term">36ste Eerste Kamer</meeting>
<respStmt>
<persName xml:id="RubenvanHeusden"
xml:lang="nl">Ruben van Heusden</persName>
<resp xml:lang="en">Downloading and converting the corpus to TEI format</resp>
</respStmt>
<funder>
<orgName xml:lang="en">The CLARIN research infrastructure</orgName>
</funder>
</titleStmt>
<editionStmt>
<edition>2.1</edition>
</editionStmt>
<extent>
<measure unit="speeches" xml:lang="nl"
quantity="474964">474,964 toespraken</measure>
<measure unit="speeches" xml:lang="en"
quantity="474964">474,964 speeches</measure>
<measure unit="words" xml:lang="nl"
quantity="51451191">51,451,191 woorden</measure>
<measure unit="words" xml:lang="en"
quantity="51451191">51,451,191 words</measure>
</extent>
<publicationStmt>
<publisher>
<orgName xml:lang="en">CLARIN research infrastructure</orgName>
<ref target="https://www.clarin.eu/">www.clarin.eu</ref>
</publisher>
<idno subtype="handle" type="URI">http://hdl.handle.net/11356/1432</idno>
<availability status="free">
<licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p xml:lang="en">This work is licensed under the<ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>
</p>
</availability>
<date when="2021-06-10">June 10, 2021</date>
</publicationStmt>
<sourceDesc>
<bibl>
<title type="main">Minutes of the Eerste Kamer of The Netherlands</title>
<idno type="URI">https://www.eerstekamer.nl/</idno>
<date from="2014-12-15" to="2020-11-03">2014-12-15 - 2020-11-03</date>
</bibl>
<bibl>
<title type="main">Minutes of the Tweede Kamer of The Netherlands</title>
<idno type="URI">https://www.tweedekamer.nl/</idno>
<date from="2014-04-16" to="2020-10-14">2014-04-16 - 2020-10-14</date>
</bibl>
</sourceDesc>
</fileDesc> |
Example | Example of the <fileDesc> element in a corpus component: <fileDesc>
<titleStmt>
<title type="main" xml:lang="en">Dutch parliamentary corpus ParlaMint-NL, Lower House 2014-04-16 [ParlaMint]</title>
<title type="main" xml:lang="nl">Corpus van het Nederlandse parlement ParlaMint-NL, Tweede Kamer 2014-04-16 [ParlaMint]</title>
<title type="sub" xml:lang="en">Report of the meeting of the Dutch Lower House, Meeting 76, Session 2 (2014-04-16)</title>
<title type="sub" xml:lang="nl">Verslag van de vergadering van de Tweede Kamer, Meeting 76, Session 2 (2014-04-16)</title>
<meeting ana="#parla.lower #parla.meeting.regular"
corresp="#TK" n="76">Meeting 76</meeting>
<meeting ana="#parla.lower #parla.session"
corresp="#TK" n="2">Session 2</meeting>
<meeting ana="#parla.lower #parla.term #TK.28"
corresp="#TK" n="28-lower">Meeting of the 28th Tweede Kamer</meeting>
<respStmt>
<persName xml:id="RubenvanHeusden"
xml:lang="nl">Ruben van Heusden</persName>
<resp xml:lang="en">Downloading and converting the corpus to TEI format</resp>
</respStmt>
<funder>
<orgName xml:lang="en">The CLARIN research infrastructure</orgName>
</funder>
</titleStmt>
<editionStmt>
<edition>2.1</edition>
</editionStmt>
<extent>
<measure unit="speeches" xml:lang="nl"
quantity="18">18 toespraken</measure>
<measure unit="speeches" xml:lang="en"
quantity="18">18 speeches</measure>
<measure unit="words" xml:lang="nl"
quantity="1094">1,094 woorden</measure>
<measure unit="words" xml:lang="en"
quantity="1094">1,094 words</measure>
</extent>
<publicationStmt>
<publisher>
<orgName xml:lang="en">CLARIN research infrastructure</orgName>
<ref target="https://www.clarin.eu/">www.clarin.eu</ref>
</publisher>
<idno subtype="handle" type="URI">http://hdl.handle.net/11356/1432</idno>
<availability status="free">
<licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p xml:lang="en">This work is licensed under the<ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>
</availability>
<date when="2021-06-10">June 10, 2021</date>
</publicationStmt>
<sourceDesc>
<bibl>
<title type="main">Minutes of the Tweede Kamer of The Netherlands</title>
<idno type="URI">https://www.tweedekamer.nl/</idno>
<date when="2014-04-16">2014-04-16</date>
</bibl>
</sourceDesc>
</fileDesc> |
Content model | <content> <elementRef key="titleStmt"/> <elementRef key="editionStmt"/> <elementRef key="extent"/> <elementRef key="publicationStmt"/> <elementRef key="sourceDesc"/> </content> ⚓ |
Schema Declaration | element fileDesc { tei_titleStmt, tei_editionStmt, tei_extent, tei_publicationStmt, tei_sourceDesc }⚓ |
<forename> (forename) contains a forename, given or baptismal name. [13.2.1. Personal Names] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Member of | |
Contained by | namesdates: persName |
May contain | Character data only |
Example | <persName>
<surname>Bongiorno</surname>
<forename>Giulia</forename>
</persName> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element forename { tei_att.global.attribute.xmllang, text }⚓ |
<funder> (funding body) specifies the name of an individual, institution, or organisation responsible for the funding of a project or text. [2.2.1. The Title Statement] | |
Module | header — Formal specification |
Contained by | header: titleStmt |
May contain | |
Note | Funders provide financial support for a project; they are distinct from sponsors (see element <sponsor>), who provide intellectual support and authority. |
Example | <funder>
<orgName xml:lang="es">CLARIN infraestructura de investigación científica</orgName>
<orgName xml:lang="en">The CLARIN research infrastructure</orgName>
</funder> |
Content model | <content> <elementRef key="orgName" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="ref" minOccurs="0" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element funder { tei_orgName+, tei_ref? }⚓ |
<gap> (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. [3.5.3. Additions, Deletions, and Omissions] | |||||||
Module | core — Formal specification | ||||||
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | core: desc | ||||||
Note | The <gap>, <unclear>, and <del> core tag elements may be closely allied in use with the <damage> and <supplied> elements, available when using the additional tagset for transcription of primary sources. See section 11.3.3.2. Use of the gap, del, damage, unclear, and supplied Elements in Combination for discussion of which element is appropriate for which circumstance. The <gap> tag simply signals the editors decision to omit or inability to transcribe a span of text. Other information, such as the interpretation that text was deliberately erased or covered, should be indicated using the relevant tags, such as <del> in the case of deliberate deletion. | ||||||
Example | <gap reason="inaudible">
<desc>microphone muted</desc>
</gap> | ||||||
Example | <gap reason="editorial">
<desc xml:lang="de">Zitierte Druckfassung entfernt</desc>
<desc xml:lang="en">Quoted printed matter omited</desc>
</gap> | ||||||
Example | <gap reason="foreign">
<desc xml:lang="und">Huliniahuanngittunga</desc>
</gap> | ||||||
Content model | <content> <elementRef key="desc" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||
Schema Declaration | element gap { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, attribute reason { "inaudible" | "editorial" | "foreign" }?, tei_desc+ }⚓ |
<graphic> (graphic) indicates the location of a graphic or illustration, either forming part of a text, or providing an image of it. [3.10. Graphics and Other Non-textual Components 11.1. Digital Facsimiles] | |
Module | core — Formal specification |
Attributes | att.resourced (@url) att.media (width, height, @scale) |
Member of | |
Contained by | figures: figure |
May contain | Empty element |
Note | The mimeType attribute should be used to supply the MIME media type of the image specified by the url attribute. Within the body of a text, a <graphic> element indicates the presence of a graphic component in the source itself. Within the context of a <facsimile> or <sourceDoc> element, however, a <graphic> element provides an additional digital representation of some part of the source being encoded. |
Example | <figure>
<graphic url="https://www.dekamer.be//site/wwwroot/images/cv/06595.gif"/>
</figure> |
Content model | <content> <empty/> </content> ⚓ |
Schema Declaration | element graphic { tei_att.media.attribute.scale, tei_att.resourced.attributes, empty }⚓ |
<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers] | |
Module | core — Formal specification |
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.typed (subtype, @type) |
Contained by | |
May contain | Character data only |
Note | The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section. |
Example | The most common use for the <head> element is to mark the headings of sections: <div type="debateSection">
<head>Regulation of Health and Social Care Professions Etc. Bill [HL]</head>
...
</div> |
Example | The <head> element may also be used to give the title to specialised lists: <listEvent>
<head xml:lang="nl">Zittingsperiode</head>
<head xml:lang="en">Legislative period</head>
<event to="2007-05-02" from="2003-06-05"
xml:id="period_51">
<label xml:lang="nl">Zittingsperiode 51</label>
<label xml:lang="en">Legislative period 51</label>
</event>
...
</listEvent> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element head { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.typed.attribute.type, text }⚓ |
<hyphenation> (hyphenation) summarizes the way in which hyphenation in a source text has been treated in an encoded version of it. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: editorialDecl |
May contain | core: p |
Example | <editorialDecl> ...
<hyphenation>
<p xml:lang="en">No end-of-line hyphens were present in the source.</p>
</hyphenation>
...
</editorialDecl> |
Content model | <content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element hyphenation { tei_p+ }⚓ |
<idno> (identifier) supplies an identifier used to identify some object, such as a person or organisation. If it is a URL, it should have @type="URI" . [13.3.1. Basic Principles 2.2.4. Publication, Distribution, Licensing, etc. 2.2.5. The Series Statement 3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |||||||||||||||
Module | header — Formal specification | ||||||||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang)
| ||||||||||||||
Member of | |||||||||||||||
Contained by | |||||||||||||||
May contain | Character data only | ||||||||||||||
Note | <idno> should be used for labels which identify an object or concept in a formal cataloguing system such as a database or an RDF store, or in a distributed system such as the World Wide Web. Some suggested values for type on <idno> are ISBN, ISSN, DOI, and URI. | ||||||||||||||
Example | <publicationStmt> ...
<idno type="URI" subtype="handle">http://hdl.handle.net/11356/1432</idno>
...
</publicationStmt> | ||||||||||||||
Example | <sourceDesc>
<bibl>
<title type="main" xml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title>
...
<idno type="URI">https://www.dz-rs.si</idno>
...
</bibl>
</sourceDesc> | ||||||||||||||
Example | <idno type="URI" subtype="wikimedia"
xml:lang="sl">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno>
<idno type="URI" subtype="wikimedia"
xml:lang="en">https://en.wikipedia.org/wiki/Positive_Slovenia</idno> | ||||||||||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||||||||||
Schema Declaration | element idno { tei_att.global.attribute.xmllang, attribute type { "URI" | "VIAF" }, attribute subtype { "handle" | "government" | "politicalParty" | "parliament" | "ministry" | "personal" | "business" | "publicService" | "wikimedia" | "facebook" | "twitter" | "tiktok" | "instagram" }?, text }⚓ |
<incident> (incident) marks any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication. [8.3.3. Vocal, Kinesic, Incident] | |||||||
Module | spoken — Formal specification | ||||||
Attributes | att.ascribed (@who) att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | core: desc | ||||||
Example | <incident type="action">
<desc>He stands and with him the whole Assembly</desc>
</incident> | ||||||
Example | <incident type="sound">
<desc>The Assembly observed a minute of silence. Applause.</desc>
</incident> | ||||||
Example | <incident type="entering">
<desc>Arrival of the President of the Republic of Poland</desc>
</incident> | ||||||
Content model | <content> <elementRef key="desc" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||
Schema Declaration | element incident { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.typed.attribute.subtype, tei_att.ascribed.attributes, attribute type { "action" | "incident" | "leaving" | "entering" | "break" | "pause" | "sound" | "editorial" }?, tei_desc+ }⚓ |
<include> is an element from the XML namespace of the XML Inclusions (XInclude) W3C recommendation. It is used to include, into a ParlaMint <teiCorpus> root file the elements of the corpus that are stored as separate files. These are the <TEI> corpus components and parts of the corpus root <teiHeader>. Inside <particDesc> these are <listPerson> & <listOrg>, and <taxonomy> inside <classDecl>. | |||||||
Namespace | http://www.w3.org/2001/XInclude | ||||||
Module | derived-module-parlamint | ||||||
Attributes |
| ||||||
Contained by | |||||||
May contain | Empty element | ||||||
Example | Using XInclude in ParlaMint to include corpus components into the corpus root: <teiCorpus xml:lang="en"
xml:id="ParlaMint-GB" xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader> ...TEI header of the corpus...
</teiHeader>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="2015/ParlaMint-GB_2015-01-05-commons.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="2015/ParlaMint-GB_2015-01-06-commons.xml"/>
...
</teiCorpus> |
<kinesic> (kinesic) marks any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc. [8.3.3. Vocal, Kinesic, Incident] | |||||||
Module | spoken — Formal specification | ||||||
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.typed (type, @subtype) att.ascribed (@who)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | core: desc | ||||||
Example | <kinesic type="signal">
<desc>sign for the end of discussion</desc>
</kinesic> | ||||||
Example | <kinesic type="laughter">
<desc xml:lang="hr">smijeh.</desc>
</kinesic> | ||||||
Example | <kinesic type="applause">
<desc xml:lang="sl">ploskanje</desc>
</kinesic> | ||||||
Content model | <content> <elementRef key="desc" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||
Schema Declaration | element kinesic { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.typed.attribute.subtype, tei_att.ascribed.attribute.who, attribute type { "kinesic" | "applause" | "ringing" | "signal" | "playback" | "gesture" | "smiling" | "laughter" | "snapping" | "noise" }?, tei_desc+ }⚓ |
<label> (label) contains any label or heading used to identify part of a text, typically but not exclusively in a list or glossary. [3.8. Lists] | |
Module | core — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Member of | |
Contained by | header: application namesdates: event |
May contain | namesdates: orgName character data |
Example | Labels denote the existence of organisations and connected events: <org xml:id="DZ" role="parliament"
ana="#parla.national #parla.lower">
<orgName xml:lang="sl" full="yes">Državni zbor Republike Slovenije</orgName>
<orgName xml:lang="en" full="yes">National Assembly of the Republic of Slovenia</orgName>
<event from="1992-12-23">
<label xml:lang="en">existence</label>
</event>
...
<listEvent>
<head xml:lang="sl">Mandatno obdobje</head>
<head xml:lang="en">Legislative period</head>
<event xml:id="DZ.7" from="2014-08-01"
to="2018-06-21">
<label xml:lang="sl">7. mandat</label>
<label xml:lang="en">Term 7</label>
</event>
<event xml:id="DZ.8" from="2018-06-22">
<label xml:lang="sl">8. mandat</label>
<label xml:lang="en">Term 8</label>
</event>
</listEvent>
</org> |
Example | Labels may also be used to give a name to the tools used in compiling the corpus: <application ident="int-tagger"
version="1.0">
<label>INT Tagger, lemmatizer and Tokenizer</label>
<desc xml:lang="en">INT Tagger, lemmatizer and Tokenizer for modern Dutch, based on old-school machine learning (SVM). It provides the legacy PoS tags (encoded in w/@ana) and the lemmata for Dutch. Not publicly available.</desc>
</application> |
Example | Labels may also be used for other structured list items: <listEvent>
<head xml:lang="lv">Saeimas sasaukumi</head>
<head xml:lang="en">Legislative period</head>
<event xml:id="PT.12" from="2014-11-04"
to="2018-11-05">
<label xml:lang="lv">12. Saeima</label>
<label xml:lang="en">Term 12</label>
</event>
<event xml:id="PT.13" from="2018-11-06">
<label xml:lang="lv">13. Saeima</label>
<label xml:lang="en">Term 13</label>
</event>
</listEvent> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <textNode/> <elementRef key="orgName"/> </alternate> </content> ⚓ |
Schema Declaration | element label { tei_att.global.attribute.xmllang, ( text | tei_orgName ) }⚓ |
<langUsage> (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text. [2.4.2. Language Usage 2.4. The Profile Description 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: profileDesc |
May contain | header: language |
Example | <langUsage>
<language ident="sl" xml:lang="sl">slovenski</language>
<language ident="en" xml:lang="sl">angleški</language>
<language ident="sl" xml:lang="en">Slovenian</language>
<language ident="en" xml:lang="en">English</language>
</langUsage> |
Content model | <content> <elementRef key="language" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element langUsage { tei_language+ }⚓ |
<language> (language) characterizes a single language or sublanguage used within a text. [2.4.2. Language Usage] | |||||||||||||
Module | header — Formal specification | ||||||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang)
| ||||||||||||
Contained by | header: langUsage | ||||||||||||
May contain | Character data only | ||||||||||||
Note | Particularly for sublanguages, an informal prose characterization should be supplied as content for the element. | ||||||||||||
Example | <langUsage>
<language ident="es" xml:lang="es">Español</language>
<language ident="es" xml:lang="en">Spanish</language>
</langUsage> | ||||||||||||
Example | <langUsage>
<language ident="bg-Latn" xml:lang="en">Bulgarian in Latin script</language>
<language ident="bg" xml:lang="bg">български</language>
<language ident="bg" xml:lang="en">Bulgarian</language>
<language ident="en" xml:lang="bg">английски</language>
<language ident="en" xml:lang="en">English</language>
<language ident="fr" xml:lang="bg">френски</language>
<language ident="fr" xml:lang="en">French</language>
</langUsage> | ||||||||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||||||||
Schema Declaration | element language { tei_att.global.attribute.xmllang, attribute ident { text }, attribute usage { text }?, text }⚓ |
<licence> contains information about a licence or other legal agreement applicable to the text. [2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | header — Formal specification |
Contained by | header: availability |
May contain | XSD anyURI |
Note | A <licence> element should be supplied for each licence agreement applicable to the text in question. The target attribute may be used to reference a full version of the licence. The when, notBefore, notAfter, from or to attributes may be used in combination to indicate the date or dates of applicability of the licence. |
Example | The <licence> specifies fixed-value CC BY 4.0 URL, and in the following paragraph gives a prose description of the licence: <licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p>This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>
</p> |
Example | The textual information on licence can be given in more than one language: <licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p xml:lang="hr">Ovaj rad je dostupan pod <ref target="http://creativecommons.org/licenses/by/4.0/">međunarodnom licencom Creative Commons Imenovanje 4.0</ref>
</p>
<p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>
</p> |
Content model | <content> <dataRef name="anyURI"/> </content> ⚓ |
Schema Declaration | element licence { xsd:anyURI }⚓ |
<link> (link) defines an association or hypertextual link among elements or passages, of some type not more precisely specifiable by other elements. [16.1. Links] | |||||||||||||
Module | linking — Formal specification | ||||||||||||
Attributes |
| ||||||||||||
Member of | |||||||||||||
Contained by | linking: linkGrp | ||||||||||||
May contain | Empty element | ||||||||||||
Note | This element should only be used to encode associations not otherwise provided for by more specific elements. The location of this element within a document has no significance, unless it is included within a <linkGrp>, in which case it may inherit the value of the type attribute from the value given on the <linkGrp>. | ||||||||||||
Example | Element <link>, given in <linkGrp> joins two tokens according to their syntactic dependency. The example below illustrating this is given, for readability, without the word-level linguistic attributes and with shortened IDs: <s xml:id="ParlaMint-GB_2021-01-06.seg393.8">
<w xml:id="ParlaMint-GB_2021-01-06.seg393.8.1">I</w>
<w xml:id="ParlaMint-GB_2021-01-06.seg393.8.2">support</w>
<w xml:id="ParlaMint-GB_2021-01-06.seg393.8.3">the</w>
<w join="right"
xml:id="ParlaMint-GB_2021-01-06.seg393.8.4">amendment</w>
<pc xml:id="ParlaMint-GB_2021-01-06.seg393.8.5">.</pc>
<linkGrp targFunc="head argument"
type="UD-SYN">
<link ana="ud-syn:nsubj"
target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.1"/>
<link ana="ud-syn:root"
target="#ParlaMint-GB_2021-01-06.seg393.8 #ParlaMint-GB_2021-01-06.seg393.8.2"/>
<link ana="ud-syn:det"
target="#ParlaMint-GB_2021-01-06.seg393.8.4 #ParlaMint-GB_2021-01-06.seg393.8.3"/>
<link ana="ud-syn:obj"
target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.4"/>
<link ana="ud-syn:punct"
target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.5"/>
</linkGrp>
</s> | ||||||||||||
Schematron |
<sch:assert test="contains(normalize-space(@target),' ')">You must supply at least two values for @target or on <sch:name/>
</sch:assert> | ||||||||||||
Content model | <content> <empty/> </content> ⚓ | ||||||||||||
Schema Declaration | element link { attribute ana { text }, attribute target { list { ? } }, empty }⚓ |
<linkGrp> (link group) defines a collection of associations or hypertextual links. [16.1. Links] | |||||||||||||
Module | linking — Formal specification | ||||||||||||
Attributes |
| ||||||||||||
Member of | |||||||||||||
Contained by | analysis: s | ||||||||||||
May contain | linking: link | ||||||||||||
Note | May contain one or more <link> or <ptr> elements. A web or link group is an administrative convenience, which should be used to collect a set of links together for any purpose, not simply to supply a default value for the type attribute. | ||||||||||||
Example | Syntactic analysis is stored in the link group, <linkGrp> element, which is then composed of <link> elements. The example below illustrating this is given, for readability, without the word-level linguistic attributes and with shortened IDs: <s xml:id="ParlaMint-GB_2021-01-06.seg393.8">
<w xml:id="ParlaMint-GB_2021-01-06.seg393.8.1">I</w>
<w xml:id="ParlaMint-GB_2021-01-06.seg393.8.2">support</w>
<w xml:id="ParlaMint-GB_2021-01-06.seg393.8.3">the</w>
<w join="right"
xml:id="ParlaMint-GB_2021-01-06.seg393.8.4">amendment</w>
<pc xml:id="ParlaMint-GB_2021-01-06.seg393.8.5">.</pc>
<linkGrp targFunc="head argument"
type="UD-SYN">
<link ana="ud-syn:nsubj"
target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.1"/>
<link ana="ud-syn:root"
target="#ParlaMint-GB_2021-01-06.seg393.8 #ParlaMint-GB_2021-01-06.seg393.8.2"/>
<link ana="ud-syn:det"
target="#ParlaMint-GB_2021-01-06.seg393.8.4 #ParlaMint-GB_2021-01-06.seg393.8.3"/>
<link ana="ud-syn:obj"
target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.4"/>
<link ana="ud-syn:punct"
target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.5"/>
</linkGrp>
</s> | ||||||||||||
Content model | <content> <elementRef maxOccurs="unbounded" key="link"/> </content> ⚓ | ||||||||||||
Schema Declaration | element linkGrp { attribute targFunc { "head argument" }, attribute type { "UD-SYN" }, tei_link+ }⚓ |
<listEvent> (list of events) contains a list of descriptions, each of which provides information about an identifiable event. [13.3.1. Basic Principles] | |
Module | namesdates — Formal specification |
Member of | |
Contained by | namesdates: org |
May contain | |
Example | <listEvent>
<event xml:id="GOV.11" from="2013-03-20"
to="2014-09-18">
<label xml:lang="sl">11. vlada Republike Slovenije (20. marec 2013 - 18. september 2014)</label>
<label xml:lang="en">11th Government of the Republic of Slovenia (20 March 2013 - 18 September 2014)</label>
</event>
...
<event xml:id="GOV.14" from="2018-03-13">
<label xml:lang="sl">14. vlada Republike Slovenije (13. marec 2020 - danes)</label>
<label xml:lang="en">14th Government of the Republic of Slovenia (March 13, 2020 - today)</label>
</event>
</listEvent> |
Example | <org ana="#parla.national #parla.upper"
role="parliament" xml:id="LEG">
<orgName full="yes" xml:lang="it">Senato della Repubblica Italiana</orgName>
<orgName full="yes" xml:lang="it">Senate of the Republic of Italy</orgName>
...
<listEvent>
<event from="2013-03-15" to="2018-03-22"
xml:id="LEG.17">
<label xml:lang="it">XVII Legislatura</label>
<label xml:lang="en">XVII Legislative Term</label>
</event>
<event from="2018-03-23" xml:id="LEG.18">
<label xml:lang="it">XVIII Legislatura</label>
<label xml:lang="en">XVIII Legislative Term</label>
</event>
</listEvent>
</org> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="event" minOccurs="0" maxOccurs="unbounded"/> </sequence> </content> ⚓ |
Schema Declaration | element listEvent { tei_head*, tei_event* }⚓ |
<listOrg> (list of organizations) contains a list of elements, each of which provides information about an identifiable organisation. [13.2.2. Organizational Names] | |
Module | namesdates — Formal specification |
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang) |
Member of | |
Contained by | corpus: particDesc |
May contain | core: head namesdates: listRelation org |
Note | The type attribute may be used to distinguish lists of organizations of a particular type if convenient. |
Example | <listOrg>
<org xml:id="government.GB"
role="government"> ...
</org>
<org xml:id="PoGB" role="parliament"> ...
</org>
<org role="parliamentaryGroup"
xml:id="party.LI"> ...
</org>
...
<listRelation> ...
</listRelation>
</listOrg> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="org" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="listRelation" minOccurs="0" maxOccurs="1"/> </sequence> </content> ⚓ |
Schema Declaration | element listOrg { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, ( tei_head*, tei_org+, tei_listRelation? ) }⚓ |
<listPerson> (list of persons) contains a list of descriptions, each of which provides information about an identifiable person or a group of people, for example the participants in a language interaction, or the people referred to in a historical source. [13.3.2. The Person Element 15.2. Contextual Information 2.4. The Profile Description 15.3.2. Declarable Elements] | |
Module | namesdates — Formal specification |
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang) |
Member of | |
Contained by | corpus: particDesc |
May contain | |
Note | The type attribute may be used to distinguish lists of people of a particular type if convenient. |
Example | <listPerson>
<head>List of speakers</head>
<person xml:id="SayeedaWarsi"> ...
</person>
<person xml:id="DavidHamilton"> ...
</person>
...
</listPerson> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="person" minOccurs="1" maxOccurs="unbounded"/> </sequence> </content> ⚓ |
Schema Declaration | element listPerson { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, ( tei_head*, tei_person+ ) }⚓ |
<listPrefixDef> (list of prefix definitions) contains a list of definitions of prefixing schemes used in teidata.pointer values, showing how abbreviated URIs using each scheme may be expanded into full URIs. [16.2.3. Using Abbreviated Pointers] | |
Module | header — Formal specification |
Contained by | header: encodingDesc |
May contain | header: prefixDef |
Example | In this example, two private URI scheme prefixes are defined and patterns are provided for dereferencing them. Each prefix is also supplied with a human-readable explanation in a <p> element. <listPrefixDef>
<prefixDef ident="ud-syn"
matchPattern="(.+)" replacementPattern="#$1">
<p>Private URIs with this prefix point to elements giving their name. In this document they are simply local references into the UD-SYN taxonomy categories in the corpus root TEI header.</p>
</prefixDef>
<prefixDef ident="ne" matchPattern="(.+)"
replacementPattern="#NER.cnec2.0.$1">
<p>Taxonomy for named entities (cnec2.0)</p>
</prefixDef>
</listPrefixDef> |
Content model | <content> <elementRef key="prefixDef" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element listPrefixDef { tei_prefixDef+ }⚓ |
<listRelation> provides information about relationships identified amongst people, places, and organisations, either informally as prose or as formally expressed relation links. [13.3.2.3. Personal Relationships] | |
Module | namesdates — Formal specification |
Member of | |
Contained by | namesdates: listOrg |
May contain | namesdates: relation |
Note | May contain a prose description organized as paragraphs, or a sequence of <relation> elements. |
Example | <listOrg>
<org role="parliamentaryGroup"
xml:id="party.LD">
<orgName full="yes">Liberal Democrat</orgName>
<orgName full="abb">LD</orgName>
</org>
<org role="parliamentaryGroup"
xml:id="party.I">
<orgName full="yes">Independent</orgName>
<orgName full="abb">I</orgName>
</org>
<org role="parliamentaryGroup"
xml:id="party.0UBS">
<orgName full="yes">Independent Conservative</orgName>
<orgName full="abb">0UBS</orgName>
</org>
<org>... </org>
<listRelation>
<relation name="coalition"
mutual="#party.CON #party.LD" from="2010-05-06" to="2015-05-07"/>
<relation name="opposition"
active="#party.LAB #party.SO0T #party.64RT #party.SDLP #party.L1QU #party.0UBS
#party.BI #party.LI #party.LB #party.LJ95 #party.IGC #party.NPBE #party.CB
#party.QMZZ #party.IL #party.UUP #party.FZPG #party.A #party.GP #party.SNP
#party.I #party.L8TA #party.CON #party.NA #party.DUP #party.UUSL #party.ZKPW
#party.UKIP #party.PC" passive="#government.GB"
from="2010-05-06" to="2015-05-07"/>
<relation>...</relation>
...
</listRelation>
</listOrg> |
Content model | <content> <elementRef key="relation" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element listRelation { tei_relation+ }⚓ |
<measure> (measure) contains a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name. [3.6.3. Numbers and Measures] | |||||||||||||||
Module | core — Formal specification | ||||||||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang)
| ||||||||||||||
Member of | |||||||||||||||
Contained by | header: extent | ||||||||||||||
May contain | Character data only | ||||||||||||||
Example | <measure unit="speeches" quantity="75122"
xml:lang="sl">75.122 govorov</measure>
<measure unit="speeches" quantity="75122"
xml:lang="en">75,122 speeches</measure>
<measure unit="words" quantity="20190034"
xml:lang="sl">20.190.034 besed</measure>
<measure unit="words" quantity="20190034"
xml:lang="en">20,190,034 words</measure> | ||||||||||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||||||||||
Schema Declaration | element measure { tei_att.global.attribute.xmllang, attribute unit { "speeches" | "words" | "tokens" }, attribute quantity { text }, text }⚓ |
<media> indicates the location of any form of external media such as an audio or video clip etc. [3.10. Graphics and Other Non-textual Components] | |||||||||
Module | core — Formal specification | ||||||||
Attributes | att.resourced (@url) att.global (n, xml:lang, xml:base, xml:space, @xml:id) att.global.source (@source)
| ||||||||
Member of | |||||||||
Contained by | spoken: recording | ||||||||
May contain | Empty element | ||||||||
Note | The attributes available for this element are not appropriate in all cases. For example, it makes no sense to specify the temporal duration of a graphic. Such errors are not currently detected. The mimeType attribute must be used to specify the MIME media type of the resource specified by the url attribute. | ||||||||
Example | <recording type="audio">
<media xml:id="ps2013-009-01-001-001.audio1"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2013ps/audio/2014/05/07/2014050713581412.mp3"
url="2013ps/audio/2014/05/07/2014050713581412.mp3"/>
<media xml:id="ps2013-009-01-001-001.audio2"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2013ps/audio/2014/05/07/2014050714081422.mp3"
url="2013ps/audio/2014/05/07/2014050714081422.mp3"/>
<media xml:id="ps2013-009-01-001-001.audio3"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2013ps/audio/2014/05/07/2014050714181432.mp3"
url="2013ps/audio/2014/05/07/2014050714181432.mp3"/>
...
</recording> | ||||||||
Content model | <content> <empty/> </content> ⚓ | ||||||||
Schema Declaration | element media { tei_att.global.attribute.xmlid, tei_att.global.source.attribute.source, tei_att.resourced.attributes, attribute mimeType { list { + } }, empty }⚓ |
<meeting> contains the formalized descriptive title for a meeting or conference, for use in a bibliographic description for an item derived from such a meeting, or as a heading or preamble to publications emanating from it. [3.12.2.2. Titles, Authors, and Editors] | |
Module | core — Formal specification |
Attributes | att.global (xml:id, xml:base, xml:space, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.global.analytic (@ana) |
Contained by | header: titleStmt |
May contain | Character data only |
Example | The specification of the particular sessions that the corpus or corpus component contains are encoded with <meeting>: <meeting n="7" corresp="#DZ"
ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting>
<meeting n="8" corresp="#DZ"
ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element meeting { tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.global.analytic.attribute.ana, text }⚓ |
<name> (name, proper noun) contains a proper noun or noun phrase. [3.6.1. Referring Strings] | |||||||
Module | core — Formal specification | ||||||
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.personal (@full) att.canonical (@key, @ref) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Note | Proper nouns referring to people, places, and organizations may be tagged instead with <persName>, <placeName>, or <orgName>, when the TEI module for names and dates is included. | ||||||
Example | The element is used to mark up Named Entities in the linguistically analysed corpus, in which case it should have the type attribute with one of the allowed values. It can also have a ref attribute to link it a definition: ...
<w lemma="and" msd="UPosTag=CCONJ">and</w>
<name type="ORG"
ref="https://en.wikipedia.org/wiki/Westminster">
<w join="right" lemma="Westminster"
msd="UPosTag=PROPN|Number=Sing">Westminster</w>
</name>
<w lemma="," msd="UPosTag=PUNCT">,</w>
...
| ||||||
Example | Element <name> is used in the TEI header to specify the location of the parliament: <name type="place">Westminster</name>
<name type="city">London</name>
<name type="country" key="GB">U.K.</name> | ||||||
Example | The element is used in the TEI header to denote person's responsibility for changes: <revisionDesc>
<change when="2021-06-11">
<name>Tomaž Erjavec</name>: Finalized encoding.</change>
<change when="2021-05-28">
<name>Tomaž Erjavec</name>: Built corpus.</change>
</revisionDesc> | ||||||
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="name"/> <elementRef key="date"/> <elementRef key="num"/> <elementRef key="time"/> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> <textNode/> </alternate> </content> ⚓ | ||||||
Schema Declaration | element name { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.personal.attribute.full, tei_att.canonical.attribute.key, tei_att.canonical.attribute.ref, tei_att.typed.attribute.subtype, attribute type { "PER" | "LOC" | "ORG" | "MISC" | "city" | "country" | "address" | "org" | "place" }?, ( tei_w | tei_pc | tei_name | tei_date | tei_num | tei_time | tei_note | tei_vocal | tei_kinesic | tei_incident | tei_gap | tei_pb | text )+ }⚓ |
<nameLink> (name link) contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of. [13.2.1. Personal Names] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Member of | |
Contained by | namesdates: persName |
May contain | Character data only |
Example | <person xml:id="PicóAntoni">
<persName>
<forename>Antoni</forename>
<surname>Picó</surname>
<nameLink>i</nameLink>
<surname>Azanza</surname>
</persName>
...
</person> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element nameLink { tei_att.global.attribute.xmllang, text }⚓ |
<namespace> (namespace) supplies the formal name of the namespace to which the elements documented by its children belong. [2.3.4. The Tagging Declaration] | |||||||
Module | header — Formal specification | ||||||
Attributes |
| ||||||
Contained by | header: tagsDecl | ||||||
May contain | header: tagUsage | ||||||
Example | To distinguish the TEI elements from the possible use of elements from other namespaces, a <namespace> element giving the TEI namespace is introduced first: <tagsDecl>
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="text" occurs="414"/>
<tagUsage gi="body" occurs="414"/>
<tagUsage gi="div" occurs="414"/>
<tagUsage gi="head" occurs="826"/>
<tagUsage gi="u" occurs="75122"/>
<tagUsage gi="seg" occurs="280971"/>
<tagUsage gi="note" occurs="85525"/>
<tagUsage gi="gap" occurs="7897"/>
<tagUsage gi="vocal" occurs="1740"/>
<tagUsage gi="incident" occurs="37"/>
<tagUsage gi="kinesic" occurs="560"/>
<tagUsage gi="desc" occurs="10234"/>
</namespace>
</tagsDecl> | ||||||
Content model | <content> <elementRef key="tagUsage" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||
Schema Declaration | element namespace { attribute name { "http://www.tei-c.org/ns/1.0" }, tei_tagUsage+ }⚓ |
<normalization> (normalization) indicates the extent of normalization or regularization of the original source carried out in converting it to electronic form. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: editorialDecl |
May contain | core: p |
Example | <editorialDecl> ...
<normalization>
<p xml:lang="en">Text has not been normalised, except for spacing.</p>
</normalization>
...
</editorialDecl> |
Content model | <content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element normalization { tei_p+ }⚓ |
<note> (note) contains a note or annotation. [3.9.1. Notes and Simple Annotation 2.2.6. The Notes Statement 3.12.2.8. Notes and Statement of Language 9.3.5.4. Notes within Entries] | |||||||
Module | core — Formal specification | ||||||
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | |||||||
Example | <note> element is used to encode transcriber comments such as who spoke, what the time was, interruptions, notes on what is happening in the chamber, results of voting etc.: <note type="speaker">The president, Dr. Milan Brglez:</note>
...
<note type="time">The session began at 10 o'clock.</note>
...
<note type="vote-ayes">84 voted for the adoption of the measure.</note>
...
<note type="vote-noes">2 voted against the adoption of the measure.</note>
...
| ||||||
Example | The <note> element can be further qualified by the <time> element to specify the date and time recorded in the note; and can also contain a page break, <pb>: <note type="time">The session began <pb/> at <time when="2016-04-13T010:00:00">10 o'clock</time>.</note> | ||||||
Example | The <note> element may also be used to mark any additional information on debate sections: <div type="debateSection">
<head>Business Before Questions</head>
<note>Death of a Member</note>
<u xml:id="ParlaMint-GB_2019-02-18-commons.u1">...</u>
...
<note>End of debateSection.</note>
</div> | ||||||
Content model | <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <elementRef key="pb"/> <elementRef key="time"/> </alternate> </content> ⚓ | ||||||
Schema Declaration | element note { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.typed.attribute.subtype, attribute type { text }?, ( text | tei_pb | tei_time )* }⚓ |
<num> (number) contains a number, written in any form. [3.6.3. Numbers and Measures] | |||||||||||||
Module | core — Formal specification | ||||||||||||
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.typed (type, @subtype)
| ||||||||||||
Member of | |||||||||||||
Contained by | |||||||||||||
May contain | |||||||||||||
Note | Detailed analyses of quantities and units of measure in historical documents may also use the feature structure mechanism described in chapter 18. Feature Structures. The <num> element is intended for use in simple applications. | ||||||||||||
Example | The element can be used for fine-grained Named Entities which include numbers: <num ana="ne:n_"
xml:id="ParlaMint-CZ_2018-11-13-ps2017-020-09-004-010.ne138">
<w xml:id="ParlaMint-CZ_2018-11-13-ps2017-020-09-004-010.u6.p17.s3.w12"
lemma="428"
msd="UPosTag=NUM|NumForm=Digit|NumType=Card" join="right">428</w>
</num> | ||||||||||||
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <textNode/> </alternate> </content> ⚓ | ||||||||||||
Schema Declaration | element num { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.typed.attribute.subtype, attribute type { "cardinal" | "ordinal" | "fraction" | "percentage" }?, ( tei_w | tei_pc | text )+ }⚓ |
<occupation> (occupation) contains an informal description of a person's trade, profession or occupation. [15.2.2. The Participant Description] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) att.datable.w3c (notBefore, notAfter, @when, @from, @to) |
Contained by | namesdates: person |
May contain | Character data only |
Note | The content of this element may be used as an alternative to the more formal specification made possible by its attributes; it may also be used to supplement the formal specification with commentary or clarification. |
Example | <person n="2678" xml:id="SimeonovValeri">
<persName xml:lang="bg">
<forename>Валери</forename>
<surname>Симеонов</surname>
</persName>
<sex value="M"/>
<birth when="1955-03-14">
<placeName>Долни Чифлик, България</placeName>
</birth>
<education>инженер</education>
<occupation>политик</occupation>
...
</person> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element occupation { tei_att.global.attribute.xmllang, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, text }⚓ |
<org> (organization) provides information about an identifiable organisation such as the government, political party, ministry etc. [13.3.3. Organizational Data] | |||||||||||||||
Module | namesdates — Formal specification | ||||||||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) att.global.analytic (@ana)
| ||||||||||||||
Contained by | namesdates: listOrg | ||||||||||||||
May contain | |||||||||||||||
Example | <org xml:id="government.BE"
role="government">
<orgName xml:lang="en" full="yes">Federal Government of Belgium</orgName>
<orgName xml:lang="nl" full="yes">Federale regering</orgName>
<orgName xml:lang="fr" full="yes">Gouvernement fédéral</orgName>
</org>
<org ana="#parla.federal #parla.lower"
role="parliament" xml:id="be_federal_parliament">
<orgName full="yes" xml:lang="nl">Federaal Parlement van België</orgName>
<orgName full="yes" xml:lang="en">Belgian Federal Parliament</orgName>
<event from="1831-02-07">
<label xml:lang="en">existence</label>
</event>
...
</org> | ||||||||||||||
Example | <org xml:id="party.PS2"
role="parliamentaryGroup">
<orgName full="yes" xml:lang="sl">Pozitivna Slovenija</orgName>
<orgName full="yes" xml:lang="en">Positive Slovenia</orgName>
<orgName full="abb">PS</orgName>
<event from="2011-10-22">
<label xml:lang="en">existence</label>
</event>
<idno type="URI" xml:lang="sl"
subtype="wikimedia">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno>
<idno type="URI" xml:lang="en"
subtype="wikimedia">https://en.wikipedia.org/wiki/Positive_Slovenia</idno>
</org> | ||||||||||||||
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="orgName" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="event" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="idno" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="desc" minOccurs="0" maxOccurs="1"/> <elementRef key="listEvent" minOccurs="0" maxOccurs="1"/> <elementRef key="state" minOccurs="0" maxOccurs="unbounded"/> </sequence> </content> ⚓ | ||||||||||||||
Schema Declaration | element org { tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, attribute xml:id { text }, attribute role { "country" | "federatedState" | "republic" | "government" | "ministry" | "parliament" | "politicalParty" | "parliamentaryGroup" | "conferenceOfChairs" | "boardOfParliament" | "ngo" | "institution" | "senate" | "committee" | "subcommittee" | "commission" | "delegation" | "supervisoryBoard" | "workingGroup" | "interparliamentaryFriendshipGroup" | "nationalCouncil" | "chamberOfThePeople" | "chamberOfTheNations" | "europeanCommission" | "europeanParliament" | "europeanInstitution" | "internationalOrganisation" | "boardOfDirectors" | "ethnicCommunity" }, ( tei_head*, tei_orgName+, tei_event*, tei_idno*, tei_desc?, tei_listEvent?, tei_state* ) }⚓ |
<orgName> (organization name) contains an organisational name. [13.2.2. Organizational Names] | |||||||||||||||||||||||||||
Module | namesdates — Formal specification | ||||||||||||||||||||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) att.canonical (key, @ref)
| ||||||||||||||||||||||||||
Member of | |||||||||||||||||||||||||||
Contained by | |||||||||||||||||||||||||||
May contain | Character data only | ||||||||||||||||||||||||||
Example | <funder>
<orgName xml:lang="en">The CLARIN research infrastructure</orgName>
<orgName xml:lang="sl">Raziskovalna infrastruktura CLARIN</orgName>
</funder> | ||||||||||||||||||||||||||
Example | <org xml:id="party.PS1"
role="parliamentaryGroup">
<orgName full="yes" xml:lang="en">Positive Slovenia</orgName>
<orgName full="yes" xml:lang="sl">Pozitivna Slovenija</orgName>
<orgName full="abb" xml:lang="sl">PS</orgName>
</org> | ||||||||||||||||||||||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||||||||||||||||||||||
Schema Declaration | element orgName { tei_att.global.attribute.xmllang, tei_att.canonical.attribute.ref, attribute from { text }?, attribute to { text }?, attribute full { "yes" | "abb" }?, text }⚓ |
<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents] | |
Module | core — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Member of | |
Contained by | |
May contain | core: ref character data |
Example | <projectDesc>
<p>
<ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>
</p>
</projectDesc> |
Example | <availability status="free">
<licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p>This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>
<p>This work is also licensed under the <ref target="https://www.parliament.uk/site-information/copyright-parliament/open-parliament-licence/">Open Parliament Licence v3.0</ref>.</p>
</availability> |
Schematron |
<sch:report test="(ancestor::tei:ab or ancestor::tei:p) and not( ancestor::tei:floatingText
|parent::tei:exemplum |parent::tei:item |parent::tei:note |parent::tei:q
|parent::tei:quote |parent::tei:remarks |parent::tei:said |parent::tei:sp
|parent::tei:stage |parent::tei:cell |parent::tei:figure )"> Abstract model violation: Paragraphs may not occur inside other paragraphs or ab elements.
</sch:report> |
Schematron |
<sch:report test="(ancestor::tei:l or ancestor::tei:lg) and not( ancestor::tei:floatingText
|parent::tei:figure |parent::tei:note )"> Abstract model violation: Lines may not contain higher-level structural elements such as div, p, or ab, unless p is a child of figure or note, or is a descendant of floatingText.
</sch:report> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="ref"/> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element p { tei_att.global.attribute.xmllang, ( tei_ref | text )+ }⚓ |
<particDesc> (participation description) describes the identifiable speakers and organisations in a ParlaMint corpus. This informations is given in the corpus root teiHeder. Note that the listPerson and listOrg elements are typically stored in separate files. [15.2. Contextual Information] | |
Module | corpus — Formal specification |
Contained by | header: profileDesc |
May contain | derived-module-parlamint: include namesdates: listOrg listPerson |
Note | May contain a prose description organized as paragraphs, or a structured list of persons and person groups, with an optional formal specification of any relationships amongst them. |
Example | <particDesc> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="href="ParlaMint-SI-listOrg.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="href="ParlaMint-SI-listPerson.xml"/>
</particDesc> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="listOrg"/> <elementRef key="include"/> </alternate> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="listPerson"/> <elementRef key="include"/> </alternate> </sequence> </content> ⚓ |
Schema Declaration | element particDesc { ( tei_listOrg | tei_include ), ( tei_listPerson | tei_include ) }⚓ |
<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.11.3. Milestone Elements] | |
Module | core — Formal specification |
Attributes | att.global (xml:lang, xml:base, xml:space, @xml:id, @n) att.global.linking (synch, next, prev, @corresp) att.global.source (@source) |
Member of | |
Contained by | |
May contain | Empty element |
Note | A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself. The type attribute may be used to characterize the page break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the page break is word-breaking, or to note the source from which it derives. |
Example | <body>
<div type="debateSection">
<pb source="https://www.psp.cz/eknih/2013ps/stenprot/017schuz/s017357.htm"
n="1"
xml:id="ParlaMint-CZ_2014-10-01-ps2013-017-09-003-036.pb1" corresp="#ps2013-017-09-003-036.audio1"/>
...
</div>
</body> |
Content model | <content> <empty/> </content> ⚓ |
Schema Declaration | element pb { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.linking.attribute.corresp, tei_att.global.source.attribute.source, empty }⚓ |
<pc> (punctuation character) contains a character or string of characters regarded as constituting a single punctuation mark. [17.1.2. Below the Word Level 17.4.2. Lightweight Linguistic Annotation] | |||||||||||||
Module | analysis — Formal specification | ||||||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) att.global.analytic (@ana) att.segLike (@function) att.linguistic (lemma, msd, @pos, @join) att.lexicographic.normalized (@norm)
| ||||||||||||
Member of | |||||||||||||
Contained by | |||||||||||||
May contain | Character data only | ||||||||||||
Example | <s>
<w lemma="I"
msd="UPosTag=PRON|Case=Nom|Number=Sing|Person=1|PronType=Prs" pos="PRP">I</w>
<w lemma="support"
msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Fin" pos="VBP">support</w>
<w lemma="the"
msd="UPosTag=DET|Definite=Def|PronType=Art" pos="DT">the</w>
<w lemma="amendment"
msd="UPosTag=NOUN|Number=Sing" pos="NN" join="right">amendment</w>
<pc msd="UPosTag=PUNCT" pos=".">.</pc>
</s> | ||||||||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||||||||
Schema Declaration | element pc { tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.segLike.attribute.function, tei_att.linguistic.attribute.pos, tei_att.linguistic.attribute.join, tei_att.lexicographic.normalized.attribute.norm, attribute xml:id { text }, attribute msd { text }, text }⚓ |
<persName> (personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc. [13.2.1. Personal Names] | |
Module | namesdates — Formal specification |
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.datable.w3c (when, notBefore, notAfter, @from, @to) att.canonical (key, @ref) |
Member of | |
Contained by | |
May contain | |
Note | Special persons (like 'anonymous', 'group' etc.) have their name in <term>. |
Example | <persName>
<surname>Broekers-Knol</surname>
<forename>Ankie</forename>
</persName> |
Example | <respStmt>
<persName>Matthew Coole</persName>
<resp>TEI corpus encoding</resp>
</respStmt> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="forename" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="addName" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="nameLink" minOccurs="0" maxOccurs="1"/> <elementRef key="roleName" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="surname" minOccurs="1" maxOccurs="unbounded"/> </alternate> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="term"/> </alternate> <alternate minOccurs="1" maxOccurs="1"> <textNode/> </alternate> </alternate> </content> ⚓ |
Schema Declaration | element persName { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.canonical.attribute.ref, ( ( tei_forename+ | tei_addName* | tei_nameLink? | tei_roleName* | tei_surname+ )+ | tei_term+ | ( text ) ) }⚓ |
<person> (person) provides information about a speaker in the corpus, at the very least their name and sex. [13.3.2. The Person Element 15.2.2. The Participant Description] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) |
Contained by | namesdates: listPerson |
May contain | |
Note | May contain either a prose description organized as paragraphs, or a sequence of more specific demographic elements drawn from the model.personPart class. |
Example | <person xml:id="AliciaKearns">
<persName>
<forename>Alicia</forename>
<forename>Alexandra Martha</forename>
<surname>Kearns</surname>
</persName>
<sex value="F"/>
<affiliation from="2019-12-12"
ref="#parla.lower" role="member"/>
<affiliation from="2019-12-12"
ref="#party.CON" role="member"/>
<idno subtype="contact" type="URI">https://members.parliament.uk/member/4805/contact</idno>
</person> |
Example | <person xml:id="AdamowiczPiotr">
<persName>
<forename>Piotr</forename>
<surname>Adamowicz</surname>
</persName>
<birth when="1961-06-26">26.06.1961</birth>
<sex value="M"/>
<affiliation role="member" ref="#party.KO"/>
</person> |
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="persName" minOccurs="1" maxOccurs="unbounded"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="sex" minOccurs="1" maxOccurs="1"/> <elementRef key="birth" minOccurs="0" maxOccurs="1"/> <elementRef key="death" minOccurs="0" maxOccurs="1"/> <elementRef key="affiliation" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="occupation" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="education" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="idno" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="figure" minOccurs="0" maxOccurs="unbounded"/> </alternate> </sequence> </content> ⚓ |
Schema Declaration | element person { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, ( tei_persName+, ( tei_sex | tei_birth? | tei_death? | tei_affiliation* | tei_occupation* | tei_education* | tei_idno* | tei_figure* )+ ) }⚓ |
<phr> (phrase) contains a semantic multi-word unit. [17.1. Linguistic Segment Categories] | |||||||||||||||||||||||
Module | analysis — Formal specification | ||||||||||||||||||||||
Attributes | att.global (n, xml:base, xml:space, @xml:id, @xml:lang)
| ||||||||||||||||||||||
Member of | |||||||||||||||||||||||
Contained by | analysis: s | ||||||||||||||||||||||
May contain | |||||||||||||||||||||||
Note | The type attribute may be used to indicate the type of phrase, taking values such as noun, verb, preposition, etc. as appropriate. | ||||||||||||||||||||||
Example | The element is used to mark multi-word units (MWEs) which have a semantic interpretation. The type should be set to sem. The MWE should be marked with the function (all semantic tags) and ana (semantic categories) attributes: ...
...
<phr type="sem" function="Z4" ana="sem:Z4">
<w pos="IN" msd="UPosTag=ADP" lemma="on"
function="Z4" ana="sem:Z4">On</w>
<w pos="DT"
msd="UPosTag=DET|Definite=Def|PronType=Art" lemma="the" function="Z4" ana="sem:Z4">the</w>
<w pos="JJ" msd="UPosTag=ADJ|Degree=Pos"
lemma="other" function="Z4" ana="sem:Z4">other</w>
<w pos="NN" msd="UPosTag=NOUN|Number=Sing"
lemma="hand" function="Z4" ana="sem:Z4" join="right">hand</w>
</phr>
...
| ||||||||||||||||||||||
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="pb"/> <textNode/> </alternate> </content> ⚓ | ||||||||||||||||||||||
Schema Declaration | element phr { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, attribute ana { list { + } }, attribute function { text }, attribute type { "sem" }?, ( tei_w | tei_pc | tei_pb | text )+ }⚓ |
<placeName> (place name) contains a place name. [13.2.3. Place Names] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) att.canonical (key, @ref) |
Member of | |
Contained by | |
May contain | core: name character data |
Example | <placeName ref="https://www.geonames.org/2523918">Palermo</placeName> |
Example | <placeName>Tours-Saint-Symphorien, Indre-et-Loire</placeName> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="name" minOccurs="0" maxOccurs="1"/> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element placeName { tei_att.global.attribute.xmllang, tei_att.canonical.attribute.ref, ( tei_name? | text ) }⚓ |
<prefixDef> (prefix definition) defines a prefixing scheme used in teidata.pointer values, showing how abbreviated URIs using the scheme may be expanded into full URIs. [16.2.3. Using Abbreviated Pointers] | |||||||||||||||||||||||||||
Module | header — Formal specification | ||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||
Contained by | header: listPrefixDef | ||||||||||||||||||||||||||
May contain | core: p | ||||||||||||||||||||||||||
Note | The abbreviated pointer may be dereferenced to produce either an absolute or a relative URI reference. In the latter case it is combined with the value of xml:base in force at the place where the pointing attribute occurs to form an absolute URI in the usual manner as prescribed by XML Base. | ||||||||||||||||||||||||||
Example | <prefixDef ident="mte" matchPattern="(.+)"
replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-hbs.xml#$1">
<p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Serbocroatian MULTEXT-East Version 6 MSDs.</p>
</prefixDef> | ||||||||||||||||||||||||||
Content model | <content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||||||||||||||||||||||
Schema Declaration | element prefixDef { attribute matchPattern { text }, attribute replacementPattern { text }, attribute ident { text }, tei_p+ }⚓ |
<profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. [2.4. The Profile Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Formal specification |
Contained by | header: teiHeader |
May contain | corpus: particDesc settingDesc |
Note | Although the content model permits it, it is rarely meaningful to supply multiple occurrences for any of the child elements of <profileDesc> unless these are documenting multiple texts. |
Example | General structure of the element <profileDesc>: <profileDesc>
<settingDesc>...</settingDesc>
<textClass>...</textClass>
<particDesc>...</particDesc>
<langUsage>...</langUsage>
</profileDesc> |
Example | Profile description of a corpus root: <profileDesc>
<settingDesc>
<setting>
<name type="address">Šubičeva ulica 4</name>
<name type="city">Ljubljana</name>
<name type="country" key="SI">Slovenia</name>
<date from="2014-08-01" to="2020-07-16">1.8.2014 - 16.7.2020</date>
</setting>
</settingDesc>
<textClass>
<textClass>
<catRef scheme="#parla.legislature"
target="#parla.bi #parla.lower"/>
</textClass>
</textClass>
<particDesc>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="href="ParlaMint-SI-listOrg.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="href="ParlaMint-SI-listPerson.xml"/>
</particDesc>
<langUsage>
<langUsage>
<language ident="sl" xml:lang="sl">slovenski</language>
<language ident="en" xml:lang="sl">angleški</language>
<language ident="sl" xml:lang="en">Slovenian</language>
<language ident="en" xml:lang="en">English</language>
</langUsage>
</langUsage>
</profileDesc> |
Example | Profile description for a corpus component. In contrast to the corpus root, only the first, the <settingDesc> is used in corpus components. <profileDesc>
<settingDesc>
<setting>
<name type="city">Ljubljana</name>
<name type="country" key="SI">Slovenija</name>
<date when="2014-08-28"
ana="#parla.sitting">28.8.2014</date>
</setting>
</settingDesc>
</profileDesc> |
Content model | <content> <elementRef key="settingDesc"/> <elementRef key="textClass" minOccurs="0" maxOccurs="1"/> <elementRef key="particDesc" minOccurs="0" maxOccurs="1"/> <elementRef key="langUsage" minOccurs="0" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element profileDesc { tei_settingDesc, tei_textClass?, tei_particDesc?, tei_langUsage? }⚓ |
<projectDesc> (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected. [2.3.1. The Project Description 2.3. The Encoding Description 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: encodingDesc |
May contain | core: p |
Example | <projectDesc>
<p xml:lang="sl">Glavni cilji projekta <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> so
(1) izdelati večjezično množico na enak način kodiranih korpusov
zapiskov parlamentarnih sej, ...</p>
<p xml:lang="en">The <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>
project aims to (1) create a multilingual set of uniformly encoded
comparable corpora of parliamentary proceedings, ...</p>
</projectDesc> |
Content model | <content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element projectDesc { tei_p+ }⚓ |
<pubPlace> (publication place) contains the name of the place where a bibliographic item was published. [3.12.2.4. Imprint, Size of a Document, and Reprint Information] | |
Module | core — Formal specification |
Contained by | header: publicationStmt |
May contain | core: ref character data |
Example | <pubPlace>
<ref target="https://github.com/clarin-eric/ParlaMint">https://github.com/clarin-eric/ParlaMint</ref>
</pubPlace> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="ref"/> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element pubPlace { tei_ref | text }⚓ |
<publicationStmt> (publication statement) groups information concerning the publication or distribution of an electronic or other text. [2.2.4. Publication, Distribution, Licensing, etc. 2.2. The File Description] | |
Module | header — Formal specification |
Contained by | header: fileDesc |
May contain | header: availability idno |
Note | Where a publication statement contains several members of the model.publicationStmtPart.agency or model.publicationStmtPart.detail classes rather than one or more paragraphs or anonymous blocks, care should be taken to ensure that the repeated elements are presented in a meaningful order. It is a conformance requirement that elements supplying information about publication place, address, identifier, availability, and date be given following the name of the publisher, distributor, or authority concerned, and preferably in that order. |
Example | <publicationStmt>
<publisher>
<orgName xml:lang="sl">Raziskovalna infrastrukutra CLARIN</orgName>
<orgName xml:lang="en">CLARIN research infrastructure</orgName>
<ref target="https://www.clarin.eu/">www.clarin.eu</ref>
</publisher>
<idno type="URI" subtype="handle">http://hdl.handle.net/11356/1432</idno>
<availability status="free">
<licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p xml:lang="sl">To delo je ponujeno pod <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Priznanje avtorstva 4.0 mednarodna licenca</ref>.</p>
<p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>
</availability>
<date when="2021-06-11">11. 6. 2021</date>
</publicationStmt> |
Content model | <content> <elementRef key="publisher"/> <elementRef key="idno"/> <elementRef key="pubPlace" minOccurs="0" maxOccurs="1"/> <elementRef key="availability"/> <elementRef key="date"/> </content> ⚓ |
Schema Declaration | element publicationStmt { tei_publisher, tei_idno, tei_pubPlace?, tei_availability, tei_date }⚓ |
<publisher> (publisher) provides the name of the organisation responsible for the publication or distribution of a bibliographic item. [3.12.2.4. Imprint, Size of a Document, and Reprint Information 2.2.4. Publication, Distribution, Licensing, etc.] | |
Module | core — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Contained by | core: bibl header: publicationStmt |
May contain | |
Note | Use the full form of the name by which a company is usually referred to, rather than any abbreviation of it which may appear on a title page |
Example | <publisher>
<orgName>CLARIN research infrastructure</orgName>
<ref target="https://www.clarin.eu/">www.clarin.eu</ref>
</publisher> |
Content model | <content> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="orgName" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="ref" minOccurs="0" maxOccurs="1"/> </sequence> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element publisher { tei_att.global.attribute.xmllang, ( ( tei_orgName+, tei_ref? ) | text ) }⚓ |
<quotation> (quotation) specifies editorial practice adopted with respect to quotation marks in the original. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: editorialDecl |
May contain | core: p |
Example | <editorialDecl> ...
<quotation>
<p xml:lang="en">Quotation marks have been left in the text and are not explicitly marked up.</p>
</quotation>
</editorialDecl> |
Schematron |
<sch:report test="not(@marks) and not (tei:p)">On <sch:name/>, either the @marks attribute should be used, or a paragraph of description provided</sch:report> |
Content model | <content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element quotation { tei_p+ }⚓ |
<recording> (recording event) provides details of an audio or video recording event used as the source of a spoken text, either directly or from a public broadcast. [8.2. Documenting the Source of Transcribed Speech 15.3.2. Declarable Elements] | |||||||||||
Module | spoken — Formal specification | ||||||||||
Attributes |
| ||||||||||
Contained by | spoken: recordingStmt | ||||||||||
May contain | core: media | ||||||||||
Note | The dur attribute is used to indicate the original duration of the recording. | ||||||||||
Example | <recording type="audio">
<media xml:id="ps2013-044-02-000-000.audio1"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2013ps/audio/2016/04/13/2016041308580912.mp3"
url="2013ps/audio/2016/04/13/2016041308580912.mp3"/>
</recording> | ||||||||||
Content model | <content> <elementRef key="media" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||||||
Schema Declaration | element recording { attribute type { "audio" | "video" }?, tei_media+ }⚓ |
<recordingStmt> (recording statement) describes a set of recordings used as the basis for transcription of a spoken text. [8.2. Documenting the Source of Transcribed Speech 2.2.7. The Source Description] | |
Module | spoken — Formal specification |
Contained by | header: sourceDesc |
May contain | spoken: recording |
Example | <recordingStmt>
<recording type="audio">
<media xml:id="ps2017-020-09-004-010.audio1"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318081822.mp3"
url="2017ps/audio/2018/11/13/2018111318081822.mp3"/>
<media xml:id="ps2017-020-09-004-010.audio2"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318181832.mp3"
url="2017ps/audio/2018/11/13/2018111318181832.mp3"/>
<media xml:id="ps2017-020-09-004-010.audio3"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318281842.mp3"
url="2017ps/audio/2018/11/13/2018111318281842.mp3"/>
...
</recording>
</recordingStmt> |
Content model | <content> <elementRef key="recording" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element recordingStmt { tei_recording+ }⚓ |
<ref> (reference) defines a reference to another location, possibly modified by additional text or comment. [3.7. Simple Links and Cross-References 16.1. Links] | |||||||||
Module | core — Formal specification | ||||||||
Attributes |
| ||||||||
Member of | |||||||||
Contained by | |||||||||
May contain | Character data only | ||||||||
Note | The target and cRef attributes are mutually exclusive. | ||||||||
Example | <projectDesc>
<p>
<ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a
project that aims to create a multilingual set of comparable corpora of
parliamentary proceedings uniformly encoded according to the <ref target="https://github.com/clarin-eric/parla-clarin">Parla-CLARIN
recommendations</ref> and ...</p>
</projectDesc> | ||||||||
Schematron |
<sch:report test="@target and @cRef">Only one of the
attributes @target' and @cRef' may be supplied on <sch:name/>
</sch:report> | ||||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||||
Schema Declaration | element ref { attribute target { list { + } }?, text }⚓ |
<relation> (relationship) describes a relationship between two organisations. [13.3.2.3. Personal Relationships] | |||||||||||||||||||||||||
Module | namesdates — Formal specification | ||||||||||||||||||||||||
Attributes | att.global.analytic (@ana) att.datable.w3c (notBefore, notAfter, @when, @from, @to)
| ||||||||||||||||||||||||
Contained by | namesdates: listRelation | ||||||||||||||||||||||||
May contain | Empty element | ||||||||||||||||||||||||
Note | Only one of the attributes active and mutual may be supplied; the attribute passive may be supplied only if the attribute active is supplied. Not all of these constraints can be enforced in all schema languages. | ||||||||||||||||||||||||
Example | Specification of coalition and opposition political parties (or parliamentary groups) in a given time period and legislative period: <relation name="coalition"
mutual="#MR #OpenVld #N-VA #CD_en_V" from="2014-10-11" to="2018-12-09"
ana="#period_54"/>
<relation name="opposition"
active="#Ecolo #cdH #DéFi #Vuye_Wouters #sp.a #PP #PS #PTB #FDF" passive="#government.BE"
from="2014-10-11" to="2018-12-09" ana="#period_54"/> | ||||||||||||||||||||||||
Example | Specification of parliamentary group representing political parties in the parliament: <relation name="representing"
active="#parliamentaryGroup.CSSD.1107"
passive="#politicalParty.CSSD.153 #politicalParty.ENO.1" from="2013-10-29" to="2017-10-26"/> | ||||||||||||||||||||||||
Schematron |
<sch:assert test="@ref or @key or @name">One of the attributes 'name', 'ref' or 'key' must be supplied</sch:assert> | ||||||||||||||||||||||||
Schematron |
<sch:report test="@active and @mutual">Only one of the attributes @active and @mutual may be supplied</sch:report> | ||||||||||||||||||||||||
Schematron |
<sch:report test="@passive and not(@active)">the attribute 'passive' may be supplied only if the attribute 'active' is supplied</sch:report> | ||||||||||||||||||||||||
Content model | <content> <empty/> </content> ⚓ | ||||||||||||||||||||||||
Schema Declaration | element relation { tei_att.global.analytic.attribute.ana, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, attribute name { "coalition" | "opposition" | "renaming" | "successor" | "representing" }, ( attribute active { list { + } }? | attribute mutual { list { + } }? ), attribute passive { list { + } }?, empty }⚓ |
<resp> (responsibility) contains a phrase describing the nature of a person's intellectual responsibility, or an organisation's role in the production or distribution of a work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement] | |
Module | core — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Contained by | core: respStmt |
May contain | Character data only |
Note | The attribute ref, inherited from the class att.canonical may be used to indicate the kind of responsibility in a normalized form by referring directly to a standardized list of responsibility types, such as that maintained by a naming authority, for example the list maintained at http://www.loc.gov/marc/relators/relacode.html for bibliographic usage. |
Example | <respStmt>
<persName>Andrej Pančur</persName>
<resp>Kodiranje TEI</resp>
<resp xml:lang="en">TEI corpus encoding</resp>
</respStmt> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element resp { tei_att.global.attribute.xmllang, text }⚓ |
<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organisations which have played a role in the production or distribution of a bibliographic work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement] | |
Module | core — Formal specification |
Contained by | header: titleStmt |
May contain | |
Example | <respStmt>
<persName>Matthew Coole</persName>
<resp>Data retrieval, Parla-CLARIN TEI XML corpus encoding and linguistic annotation.</resp>
</respStmt> |
Example | <respStmt>
<persName ref="https://orcid.org/0000-0003-3063-2239">Tommaso Agnoloni</persName>
<persName ref="https://orcid.org/0000-0002-8126-6294">Francesca Frontini</persName>
<persName ref="https://orcid.org/0000-0002-2953-8619">Simonetta Montemagni</persName>
<persName ref="https://orcid.org/0000-0002-1321-5444">Valeria Quochi</persName>
<persName ref="https://orcid.org/0000-0001-5849-0979">Giulia Venturi</persName>
<resp xml:lang="it">Definizione del progetto e metodologia</resp>
<resp xml:lang="en">Project set-up and methodology</resp>
</respStmt>
<respStmt>
<persName>Manuela Ruisi</persName>
<persName>Carlo Marchetti</persName>
<persName>Roberto Battistoni</persName>
<resp xml:lang="it">Recupero dei dati</resp>
<resp xml:lang="en">Data retrieval</resp>
</respStmt>
<respStmt>
<persName>Tommaso Agnoloni</persName>
<resp xml:lang="it">Codifica corpus in ParlaMint TEI XML</resp>
<resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp>
<resp xml:lang="it">Pulizia, normalizzazione e conversione in ParlaMint TEI XML</resp>
<resp xml:lang="en">Cleaning, normalisation and conversion to ParlaMint TEI XML</resp>
</respStmt>
...
|
Content model | <content> <elementRef key="persName" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="resp" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element respStmt { tei_persName+, tei_resp+ }⚓ |
<revisionDesc> (revision description) summarizes the revision history for a file [2.6. The Revision Description 2.1.1. The TEI Header and Its Components] | |
Module | header — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Contained by | header: teiHeader |
May contain | header: change |
Note | If present on this element, the status attribute should indicate the current status of the document. The same attribute may appear on any <change> to record the status at the time of that change. Conventionally <change> elements should be given in reverse date order, with the most recent change at the start of the list. |
Example | <revisionDesc>
<change when="2021-06-11">
<name>Tomaž Erjavec</name>: Finalized encoding.</change>
<change when="2021-05-28">
<name>Tomaž Erjavec</name>: Built corpus.</change>
</revisionDesc> |
Content model | <content> <elementRef key="change" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element revisionDesc { tei_att.global.attribute.xmllang, tei_change+ }⚓ |
<roleName> (role name) contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank. [13.2.1. Personal Names] | |
Module | namesdates — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Member of | |
Contained by | namesdates: affiliation persName |
May contain | Character data only |
Note | A <roleName> may be distinguished from an <addName> by virtue of the fact that, like a title, it typically exists independently of its holder. |
Example | <persName>
<surname>Murgel</surname>
<forename>Jasna</forename>
<roleName>dr.</roleName>
</persName> |
Example | <affiliation role="minister" ref="#GOV"
from="2020-08-01">
<roleName xml:lang="sl">Minister za obrambo</roleName>
<roleName xml:lang="en">Minister of Defence</roleName>
</affiliation> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element roleName { tei_att.global.attribute.xmllang, text }⚓ |
<s> (s-unit) contains a sentence-like division of a text. [17.1. Linguistic Segment Categories 8.4.1. Segmentation] | |
Module | analysis — Formal specification |
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.global.analytic (@ana) |
Member of | |
Contained by | linking: seg |
May contain | |
Note | The <s> element may be used to mark orthographic sentences, or any other segmentation of a text, provided that the segmentation is end-to-end, complete, and non-nesting. For segmentation which is partial or recursive, the <seg> should be used instead. The type attribute may be used to indicate the type of segmentation intended, according to any convenient typology. |
Example | <s xml:id="ParlaMint-GB_2017-10-30-lords.seg4.1">
<w lemma="I"
msd="UPosTag=PRON|Case=Nom|Number=Sing|Person=1|PronType=Prs" pos="PRP">I</w>
<w lemma="support"
msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Fin" pos="VBP">support</w>
<w lemma="the"
msd="UPosTag=DET|Definite=Def|PronType=Art" pos="DT">the</w>
<w lemma="amendment"
msd="UPosTag=NOUN|Number=Sing" pos="NN" join="right">amendment</w>
<pc msd="UPosTag=PUNCT" pos=".">.</pc>
</s> |
Schematron |
<sch:report test="tei:s">You may not nest one s element within
another: use seg instead</sch:report> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="name"/> <elementRef key="phr"/> <elementRef key="num"/> <elementRef key="date"/> <elementRef key="time"/> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> </alternate> <elementRef key="linkGrp" minOccurs="0" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element s { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.global.analytic.attribute.ana, ( tei_w | tei_pc | tei_name | tei_phr | tei_num | tei_date | tei_time | tei_note | tei_vocal | tei_kinesic | tei_incident | tei_gap | tei_pb )+, tei_linkGrp? }⚓ |
<seg> (arbitrary segment) represents any segmentation of text below the ‘chunk’ level. [16.3. Blocks, Segments, and Anchors 6.2. Components of the Verse Line 7.2.5. Speech Contents] | |
Module | linking — Formal specification |
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) |
Member of | |
Contained by | spoken: u |
May contain | |
Note | The <seg> element may be used at the encoder's discretion to mark any segments of the text of interest for processing. One use of the element is to mark text features for which no appropriate markup is otherwise defined. Another use is to provide an identifier for some segment which is to be pointed at by some other element—i.e. to provide a target, or a part of a target, for a <ptr> or other similar element. |
Example | <u who="#DavidPrior" ana="#regular">
<seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg>
<seg>The relevant document is the 20th Report from the Legislation Committee.</seg>
</u> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <elementRef key="s"/> </alternate> </alternate> </content> ⚓ |
Schema Declaration | element seg { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, ( tei_note | tei_vocal | tei_kinesic | tei_incident | tei_gap | tei_pb | ( text | tei_s )* )+ }⚓ |
<segmentation> (segmentation) describes the principles according to which the text has been segmented, for example into sentences, tone-units, graphemic strata, etc. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements] | |
Module | header — Formal specification |
Contained by | header: editorialDecl |
May contain | core: p |
Example | <editorialDecl>
<segmentation>
<p xml:lang="en">The texts are segmented into utterances (speeches) and segments (corresponding to paragraphs in the source transcription).</p>
</segmentation>
</editorialDecl> |
Content model | <content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element segmentation { tei_p+ }⚓ |
<setting> describes one particular setting in which a language interaction takes place. [15.2.3. The Setting Description] | |
Module | corpus — Formal specification |
Contained by | corpus: settingDesc |
May contain | |
Note | If the who attribute is not supplied, the setting is assumed to be that of all participants in the language interaction. |
Example | <setting>
<name type="place">Commons Chamber</name>
<name type="place">Westminster</name>
<name type="city">London</name>
<name type="country" key="GB">U.K.</name>
<date when="2019-02-18">February 18th, 2019</date>
</setting> |
Content model | <content> <elementRef key="name" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="date"/> </content> ⚓ |
Schema Declaration | element setting { tei_name+, tei_date }⚓ |
<settingDesc> (setting description) describes the setting or settings within which a language interaction takes place, or other places otherwise referred to in a text, edition, or metadata. [15.2. Contextual Information 2.4. The Profile Description] | |
Module | corpus — Formal specification |
Contained by | header: profileDesc |
May contain | corpus: setting |
Note | May contain a prose description organized as paragraphs, or a series of <setting> elements. If used to record not settings of language interactions, but other places mentioned in the text, then <place> optionally grouped by <listPlace> inside <standOff> should be preferred. |
Example | <settingDesc>
<setting>
<name type="address">Trg sv. Marka 6</name>
<name type="city">Zagreb</name>
<name type="country" key="HR">Croatia</name>
<date from="2016-11-15" to="2020-05-18">15.11.2016 - 18.5.2020</date>
</setting>
</settingDesc> |
Content model | <content> <elementRef key="setting" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element settingDesc { tei_setting+ }⚓ |
<sex> (sex) specifies the sex of a person. [13.3.2.1. Personal Characteristics] | |||||||
Module | namesdates — Formal specification | ||||||
Attributes |
| ||||||
Contained by | namesdates: person | ||||||
May contain | Empty element | ||||||
Note | As with other culturally-constructed traits such as age and gender, the way in which this concept is described in different cultural contexts varies. The normalizing attributes are provided only as an optional means of simplifying that variety for purposes of interoperability or project-internal taxonomies for consistency, and should not be used where that is inappropriate or unhelpful. The content of the element may be used to describe the intended concept in more detail. | ||||||
Example | <sex value="M"/> | ||||||
Content model | <content> <empty/> </content> ⚓ | ||||||
Schema Declaration | element sex { attribute value { "M" | "F" | "U" | "O" | "N" }, empty }⚓ |
<sourceDesc> (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. [2.2.7. The Source Description] | |
Module | header — Formal specification |
Contained by | header: fileDesc |
May contain | core: bibl spoken: recordingStmt |
Example | The source description <sourceDesc> of the corpus root encodes the original digital source of the ParlaMint corpus: <sourceDesc>
<bibl>
<title type="main" xml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title>
<title type="main" xml:lang="en">Minutes of the National Assembly of the Republic of Slovenia</title>
<idno type="URI">https://www.dz-rs.si</idno>
<date from="2014-08-01" to="2020-07-16">1.8.2014 - 16.7.2020</date>
</bibl>
</sourceDesc> |
Example | For corpus components the source description is very similar to the one for the corpus root, except it reflects information of the exact meeting. Furthermore, if the audio or video of the meeting is available, this information can also be given: <sourceDesc>
<bibl>
<title type="main" xml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title>
<title type="main" xml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title>
<idno type="URI">https://www.psp.cz/eknih/2013ps/stenprot/044schuz/s044033.htm</idno>
<date when="2016-04-13">13.04.2016</date>
</bibl>
<recordingStmt>
<recording type="audio">
<media xml:id="ps2013-044-02-000-000.audio1"
mimeType="audio/mp3"
source="https://www.psp.cz/eknih/2013ps/audio/2016/04/13/2016041308580912.mp3"
url="2013ps/audio/2016/04/13/2016041308580912.mp3"/>
</recording>
</recordingStmt>
</sourceDesc> |
Content model | <content> <elementRef key="bibl" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="recordingStmt" minOccurs="0" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element sourceDesc { tei_bibl+, tei_recordingStmt? }⚓ |
<state> (state) defines additional metadata on a political party or parliamentary group, e.g. its political orientation. [13.3.1. Basic Principles 13.3.2.1. Personal Characteristics] | |||||||||||||||
Module | namesdates — Formal specification | ||||||||||||||
Attributes | att.global (xml:id, xml:lang, xml:base, xml:space, @n) att.global.source (@source) att.datable.w3c (when, notBefore, notAfter, @from, @to) att.canonical (ref, @key)
| ||||||||||||||
Member of | |||||||||||||||
Contained by | |||||||||||||||
May contain | |||||||||||||||
Note | Where there is confusion between <trait> and <state> the more general purpose element <state> should be used even for unchanging characteristics. If you wish to distinguish between characteristics that are generally perceived to be time-bound states and those assumed to be fixed traits, then <trait> is available for the more static of these. The <state> element encodes characteristics which are sometimes assumed to change, often at specific times or over a date range, whereas the <trait> elements are used to record characteristics, such as eye-colour, which are less subject to change. Traits are typically, but not necessarily, independent of the volition or action of the holder. | ||||||||||||||
Example | Encoding political orientation as entered by an encoder: <state type="politicalOrientation">
<state type="encoder"
source="#GrietDepoorter" ana="#orientation.CRR">
<note xml:lang="en">Orientation determined by encoder, using own knowledge of the parliamentary group.</note>
</state>
</state> | ||||||||||||||
Example | Encoding Wikipedia-sourced political orientation: <state type="politicalOrientation">
<state type="Wikipedia"
source="https://lv.wikipedia.org/wiki/Attīstībai/Par!" ana="#orientation.C"/>
</state> | ||||||||||||||
Example | Encoding CHES-sourced variables and their values, with @key containing the CHES name for the political party: <state type="CHES" key="AP!" from="2019"
to="2019"
source="https://www.chesdata.eu/s/1999-2019_CHES_dataset_meansv3.csv">
<state type="variable" ana="#ches.lrgen">
<state type="value" from="2019" to="2019"
n="5.82"/>
</state>
<state type="variable" ana="#ches.lrecon">
<state type="value" from="2019" to="2019"
n="5.90"/>
</state>
</state> | ||||||||||||||
Content model | <content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="note" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="state" minOccurs="0" maxOccurs="unbounded"/> </sequence> </content> ⚓ | ||||||||||||||
Schema Declaration | element state { tei_att.global.attribute.n, tei_att.global.source.attribute.source, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.canonical.attribute.key, attribute ana { text }?, attribute type { "politicalOrientation" | "encoder" | "Wikipedia" | "CHES" | "variable" | "value" }, ( tei_note*, tei_state* ) }⚓ |
<surname> (surname) contains a family (inherited) name, as opposed to a given, baptismal, or nick name. [13.2.1. Personal Names] | |||||||
Module | namesdates — Formal specification | ||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang)
| ||||||
Member of | |||||||
Contained by | namesdates: persName | ||||||
May contain | Character data only | ||||||
Example | <persName>
<surname>Accetto</surname>
<forename>Matej</forename>
</persName> | ||||||
Example | <persName>
<forename>Ірина</forename>
<surname type="patronym">Борисівна</surname>
<surname>Щеняєва</surname>
</persName>
<persName xml:lang="uk-Latn">
<forename>Iryna</forename>
<surname type="patronym">Borysivna</surname>
<surname>Ščenjajeva</surname>
</persName> | ||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||
Schema Declaration | element surname { tei_att.global.attribute.xmllang, attribute type { "birth" | "patronym" | "married" }?, text }⚓ |
<tagUsage> (element usage) documents the usage of a specific element within a specified document. [2.3.4. The Tagging Declaration] | |||||||||||||
Module | header — Formal specification | ||||||||||||
Attributes |
| ||||||||||||
Contained by | header: namespace | ||||||||||||
May contain | Empty element | ||||||||||||
Example | <tagsDecl>
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="text" occurs="414"/>
<tagUsage gi="body" occurs="414"/>
<tagUsage gi="div" occurs="414"/>
<tagUsage gi="head" occurs="826"/>
<tagUsage gi="u" occurs="75122"/>
<tagUsage gi="seg" occurs="280971"/>
<tagUsage gi="note" occurs="85525"/>
<tagUsage gi="gap" occurs="7897"/>
<tagUsage gi="vocal" occurs="1740"/>
<tagUsage gi="incident" occurs="37"/>
<tagUsage gi="kinesic" occurs="560"/>
<tagUsage gi="desc" occurs="10234"/>
</namespace>
</tagsDecl> | ||||||||||||
Content model | <content> <empty/> </content> ⚓ | ||||||||||||
Schema Declaration | element tagUsage { attribute gi { text }, attribute occurs { text }, empty }⚓ |
<tagsDecl> (tagging declaration) provides detailed information about the tagging applied to a document. [2.3.4. The Tagging Declaration 2.3. The Encoding Description] | |
Module | header — Formal specification |
Contained by | header: encodingDesc |
May contain | header: namespace |
Example | The tags declaration, <tagsDecl> of the corpus root gives the count of all the XML tags used in the data part (so, not in the TEI header) of the corpus (for the corpus root) or in an individual component of the corpus. <encodingDesc> ...
<tagsDecl>
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="text" occurs="414"/>
<tagUsage gi="body" occurs="414"/>
<tagUsage gi="div" occurs="414"/>
...
</namespace>
</tagsDecl>
</encodingDesc> |
Content model | <content> <elementRef key="namespace"/> </content> ⚓ |
Schema Declaration | element tagsDecl { tei_namespace }⚓ |
<taxonomy> (taxonomy) defines a typology explicitly by a structured taxonomy. [2.3.7. The Classification Declaration] | |||||||||
Module | header — Formal specification | ||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang)
| ||||||||
Contained by | header: classDecl | ||||||||
May contain | |||||||||
Note | Nested taxonomies are common in many fields, so the <taxonomy> element can be nested. | ||||||||
Example | <taxonomy xml:id="subcorpus">
<desc xml:lang="sl">
<term>Podkorpusi</term>
</desc>
<desc xml:lang="en">
<term>Subcorpora</term>
</desc>
<category xml:id="reference">
<catDesc xml:lang="sl">
<term>Referenca</term>: referenčni podkorpus, do 2020-01-30</catDesc>
<catDesc xml:lang="en">
<term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc>
</category>
<category xml:id="covid">
<catDesc xml:lang="sl">
<term>COVID</term>: COVID podkorpus, od 2020-01-31 dalje</catDesc>
<catDesc xml:lang="en">
<term>COVID</term>: COVID subcorpus, from 2020-01-31 onwards</catDesc>
</category>
</taxonomy> | ||||||||
Example | <taxonomy xml:id="parla.legislature">
<desc xml:lang="it">
<term>Legislatura</term>
</desc>
<desc xml:lang="en">
<term>Legislature</term>
</desc>
<category xml:id="parla.geo-political">
<catDesc xml:lang="it">
<term>Unità geo-politica o amministrativa</term>
</catDesc>
<catDesc xml:lang="en">
<term>Geo-political or administrative units</term>
</catDesc>
<category xml:id="parla.supranational">
<catDesc xml:lang="it">
<term>Legislatura sovranazionale</term>
</catDesc>
<catDesc xml:lang="en">
<term>Supranational legislature</term>
</catDesc>
</category>
<category xml:id="parla.national">
<catDesc xml:lang="it">
<term>Legislatura nazionale</term>
</catDesc>
<catDesc xml:lang="en">
<term>National legislature</term>
</catDesc>
</category>
...
</category>
</taxonomy>
...
<org ana="#parla.national #parla.upper"
role="parliament" xml:id="LEG">
<orgName full="yes" xml:lang="it">Senato della Repubblica Italiana</orgName>
<orgName full="yes" xml:lang="it">Senate of the Republic of Italy</orgName>
</org> | ||||||||
Content model | <content> <elementRef key="desc" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="category" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||||
Schema Declaration | element taxonomy { tei_att.global.attribute.xmllang, attribute xml:id { text }, tei_desc+, tei_category+ }⚓ |
<teiCorpus> (TEI corpus) contains one whole corpus, stored in the corpus root file comprising the corpus header and XInclude references to corpus component files, each containing a <TEI> element. [4. Default Text Structure 15.1. Varieties of Composite Text] | |||||||||||||
Module | core — Formal specification | ||||||||||||
Attributes | att.global.linking (synch, next, prev, @corresp)
| ||||||||||||
Contained by | — | ||||||||||||
May contain | |||||||||||||
Note | Should contain one TEI header for the corpus, and a series of <TEI> elements, one for each text. | ||||||||||||
Example | General structure of a ParlaMint corpus root: <teiCorpus xml:lang="en"
xml:id="ParlaMint-GB" xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader> ...TEI header of the corpus...
</teiHeader>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="2015/ParlaMint-GB_2015-01-05-commons.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="2015/ParlaMint-GB_2015-01-06-commons.xml"/>
...
</teiCorpus> | ||||||||||||
Content model | <content> <elementRef key="teiHeader"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="TEI"/> <elementRef key="include"/> </alternate> </content> ⚓ | ||||||||||||
Schema Declaration | element teiCorpus { tei_att.global.linking.attribute.corresp, attribute xml:id { text }, attribute xml:lang { text }, tei_teiHeader, ( tei_TEI | tei_include )+ }⚓ |
<teiHeader> (TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources. [2.1.1. The TEI Header and Its Components 15.1. Varieties of Composite Text] | |
Module | header — Formal specification |
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang) |
Contained by | |
May contain | header: encodingDesc fileDesc profileDesc revisionDesc |
Note | One of the few elements unconditionally required in any TEI document. |
Example | Basic structure of the <teiHeader>: <teiHeader>
<fileDesc>...</fileDesc>
<encodingDesc>...</encodingDesc>
<profileDesc>...</profileDesc>
<revisionDesc>...</revisionDesc>
</teiHeader> |
Example | Example of a ParlaMint corpus component <teiHeader>: <teiHeader>
<fileDesc>
<titleStmt>
<title type="main" xml:lang="lv">Latvijas parlamenta corpus ParlaMint-LV, 12. Saeima, 2014-11-04 [ParlaMint]</title>
<title type="main" xml:lang="en">Latvian parliamentary corpus ParlaMint-LV, 12th Term, 2014-11-04 [ParlaMint]</title>
<meeting corresp="#PT"
ana="#parla.meeting.regular">Regulārā</meeting>
<meeting n="13" corresp="#PT"
ana="#parla.term #PT.13">13. sasaukums</meeting>
</titleStmt>
<editionStmt>
<edition>2.1</edition>
</editionStmt>
<extent>
<measure unit="speeches" quantity="257"
xml:lang="en">257 speeches</measure>
<measure unit="words" quantity="11847"
xml:lang="en">11,847 words</measure>
<measure unit="tokens" quantity="14628"
xml:lang="en">14628 tokens</measure>
</extent>
<publicationStmt>
<publisher>
<orgName xml:lang="en">CLARIN research infrastructure</orgName>
<ref target="https://www.clarin.eu/">www.clarin.eu</ref>
</publisher>
<idno subtype="handle" type="URI">http://hdl.handle.net/11356/1432</idno>
<availability status="free">
<licence>http://creativecommons.org/licenses/by/4.0/</licence>
<p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>
</availability>
<date when="2021-06-10">June 10, 2021</date>
</publicationStmt>
<sourceDesc>
<bibl>
<title type="main" xml:lang="lv">Saeimas sēžu stenogrammas</title>
<idno type="URI">https://www.saeima.lv/lv/transcripts/view/264</idno>
</bibl>
</sourceDesc>
</fileDesc>
<encodingDesc>
<projectDesc>
<p xml:lang="en">
<ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>
</p>
</projectDesc>
<tagsDecl>
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="text" occurs="1"/>
<tagUsage gi="body" occurs="1"/>
<tagUsage gi="div" occurs="1"/>
<tagUsage gi="head" occurs="2"/>
<tagUsage gi="note" occurs="257"/>
<tagUsage gi="u" occurs="257"/>
<tagUsage gi="seg" occurs="647"/>
</namespace>
</tagsDecl>
</encodingDesc>
<profileDesc>
<settingDesc>
<setting>
<name type="city">Rīga</name>
<name type="country" key="LV">Latvija</name>
<date when="2014-11-04"
ana="#parla.sitting">2014-11-04</date>
</setting>
</settingDesc>
</profileDesc>
</teiHeader> |
Content model | <content> <elementRef key="fileDesc"/> <elementRef key="encodingDesc"/> <elementRef key="profileDesc"/> <elementRef key="revisionDesc" minOccurs="0" maxOccurs="1"/> </content> ⚓ |
Schema Declaration | element teiHeader { tei_att.global.attribute.xmllang, tei_fileDesc, tei_encodingDesc, tei_profileDesc, tei_revisionDesc? }⚓ |
<term> (term) contains a single-word, multi-word, or symbolic designation which is regarded as a technical term. [3.4.1. Terms and Glosses] | |
Module | core — Formal specification |
Member of | |
Contained by | |
May contain | Character data only |
Note | When this element appears within an <index> element, it is understood to supply the form under which an index entry is to be made for that location. Elsewhere, it is understood simply to indicate that its content is to be regarded as a technical or specialised term. It may be associated with a <gloss> element by means of its ref attribute; alternatively a <gloss> element may point to a <term> element by means of its target attribute. In formal terminological work, there is frequently discussion over whether terms must be atomic or may include multi-word lexical items, symbolic designations, or phraseological units. The <term> element may be used to mark any of these. No position is taken on the philosophical issue of what a term can be; the looser definition simply allows the <term> element to be used by practitioners of any persuasion. As with other members of the att.canonical class, instances of this element occuring in a text may be associated with a canonical definition, either by means of a URI (using the ref attribute), or by means of some system-specific code value (using the key attribute). Because the mutually exclusive target and cRef attributes overlap with the function of the ref attribute, they are deprecated and may be removed at a subsequent release. |
Example | <term> is used inside taxonomies to name the taxonomy and its categories: <taxonomy xml:id="subcorpus">
<desc xml:lang="sl">
<term>Podkorpusi</term>
</desc>
<desc xml:lang="en">
<term>Subcorpora</term>
</desc>
<category xml:id="reference">
<catDesc xml:lang="sl">
<term>Referenca</term>: referenčni podkorpus, do 2020-10-30</catDesc>
<catDesc xml:lang="en">
<term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc>
</category>
...
</taxonomy> |
Example | <catDesc xml:lang="en">
<term>acl</term>: Clausal modifier of noun (adjectival clause)
</catDesc>
<catDesc xml:lang="en">
<term>dep</term>: Unspecified dependency
</catDesc>
<catDesc xml:lang="en">
<term>punct</term>: Punctuation
</catDesc> |
Content model | <content> <textNode/> </content> ⚓ |
Schema Declaration | element term { text }⚓ |
<text> (text) contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text] | |
Module | textstructure — Formal specification |
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.analytic (@ana) att.global.source (@source) |
Contained by | textstructure: TEI |
May contain | textstructure: body |
Note | This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose. |
Example | <text ana="#reference">
<body>
<div type="debateSection">...</div>
<div type="debateSection">...</div>
...
</body>
</text> |
Content model | <content> <elementRef key="body"/> </content> ⚓ |
Schema Declaration | element text { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.global.source.attribute.source, tei_body }⚓ |
<textClass> (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc. [2.4.3. The Text Classification] | |
Module | header — Formal specification |
Contained by | header: profileDesc |
May contain | header: catRef |
Example | <textClass>
<catRef scheme="#parla.legislature"
target="#parla.bi #parla.lower #parla.upper"/>
</textClass> |
Content model | <content> <elementRef key="catRef"/> </content> ⚓ |
Schema Declaration | element textClass { tei_catRef }⚓ |
<time> (time) contains a phrase defining a time of day in any format. [3.6.4. Dates and Times] | |
Module | core — Formal specification |
Attributes | att.typed (@type, @subtype) att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.datable.w3c (notBefore, notAfter, @when, @from, @to) |
Member of | |
Contained by | |
May contain | |
Example | A note giving the time when e.g. the session started: <note type="time">
<time when="2016-04-13T09:10:00">(9.10 hodin)</time>
</note> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <textNode/> </alternate> </content> ⚓ |
Schema Declaration | element time { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.typed.attributes, ( tei_w | tei_pc | text )+ }⚓ |
<title> (title) contains a title for any kind of work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.5. The Series Statement] | |||||||||
Module | core — Formal specification | ||||||||
Attributes | att.global (xml:id, n, xml:base, xml:space, @xml:lang)
| ||||||||
Member of | |||||||||
Contained by | |||||||||
May contain | Character data only | ||||||||
Note | The attributes key and ref, inherited from the class att.canonical may be used to indicate the canonical form for the title; the former, by supplying (for example) the identifier of a record in some external library system; the latter by pointing to an XML element somewhere containing the canonical form of the title. | ||||||||
Example | The <title> element as used in the <titleStmt> of the corpus root <teiHeader>: <title type="main" xml:lang="cs">Český parlamentní korpus ParlaMint-CZ [ParlaMint]</title>
<title type="main" xml:lang="en">Czech parliamentary corpus ParlaMint-CZ [ParlaMint]</title>
<title type="sub" xml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title>
<title type="sub" xml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title> | ||||||||
Example | The <title> element as used in the <titleStmt> of the corpus component <teiHeader>: <title type="main" xml:lang="cs">Český parlamentní korpus ParlaMint-CZ, 2013-11-25 ps2013-001-01-000-000 [ParlaMint]</title>
<title type="main" xml:lang="en">Czech parliamentary corpus ParlaMint-CZ, 2013-11-25 ps2013-001-01-000-000 [ParlaMint]</title>
<title type="sub" xml:lang="cs">Parlament České republiky, Poslanecká sněmovna, 2013-11-25, Začátek schůze Poslanecké sněmovny 25. listopadu 2013 ve 14.05 hodin Přítomno: 199 poslanců</title>
<title type="sub" xml:lang="en">Parliament of the Czech Republic, Chamber of Deputies, 2013-11-25</title> | ||||||||
Content model | <content> <textNode/> </content> ⚓ | ||||||||
Schema Declaration | element title { tei_att.global.attribute.xmllang, attribute type { "main" | "sub" }?, text }⚓ |
<titleStmt> (title statement) groups information about the title of a work and those responsible for its content. [2.2.1. The Title Statement 2.2. The File Description] | |
Module | header — Formal specification |
Contained by | header: fileDesc |
May contain | |
Example | The <titleStmt> element gives the title of the corpus root or component, along with the specification of the particular session(s) of the parliament contained, the persons responsible for compiling the corpus and the funder(s) of the project: <titleStmt>
<title type="main">Slovenski parlamentarni korpus ParlaMint-SI [ParlaMint]</title>
<title type="main" xml:lang="en">Slovenian parliamentary corpus ParlaMint-SI [ParlaMint]</title>
<title type="sub">Zapisi sej Državnega zbora Republike Slovenije, 7. in 8. mandat (2014 - 2020)</title>
<title type="sub" xml:lang="en">Minutes of the National Assembly of the Republic of Slovenia, Term 7 and 8 (2014 - 2020)</title>
<meeting n="7" corresp="#DZ"
ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting>
<meeting n="8" corresp="#DZ"
ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting>
<respStmt>
<persName ref="https://orcid.org/0000-0001-6143-6877">Andrej Pančur</persName>
<persName ref="https://orcid.org/0000-0002-1560-4099">Tomaž Erjavec</persName>
<resp>Kodiranje ParlaMint TEI XML</resp>
<resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp>
</respStmt>
<funder>
<orgName>Raziskovalna infrastruktura CLARIN</orgName>
<orgName xml:lang="en">The CLARIN research infrastructure</orgName>
</funder>
<funder>
<orgName>Slovenska raziskovalna infrastruktura CLARIN.SI</orgName>
<orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName>
</funder>
</titleStmt> |
Content model | <content> <elementRef key="title" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="meeting" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="respStmt" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="funder" minOccurs="0" maxOccurs="unbounded"/> </content> ⚓ |
Schema Declaration | element titleStmt { tei_title+, tei_meeting+, tei_respStmt*, tei_funder* }⚓ |
<u> (utterance) contains a stretch of speech usually preceded and followed by silence or by a change of speaker. [8.3.1. Utterances] | |
Module | spoken — Formal specification |
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, @corresp, @next, @prev) att.global.analytic (@ana) att.global.source (@source) att.ascribed (@who) |
Member of | |
Contained by | textstructure: div |
May contain | |
Note | Prose and a mixture of speech elements Although individual transcriptions may consistently use <u> elements for turns or other units, and although in most cases a <u> will be delimited by pause or change of speaker, <u> is not required to represent a turn or any communicative event, nor to be bounded by pauses or change of speaker. At a minimum, a <u> is some phonetic production by a given speaker. |
Example | The element <u> marks up a speech, as illustrated below: <u who="#DavidPrior" ana="#regular">
<seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg>
<seg>The relevant document is the 20th Report from the Legislation Committee.</seg>
</u> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> <elementRef key="seg"/> </alternate> </content> ⚓ |
Schema Declaration | element u { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.global.linking.attribute.next, tei_att.global.linking.attribute.prev, tei_att.global.analytic.attribute.ana, tei_att.global.source.attribute.source, tei_att.ascribed.attribute.who, ( tei_note | tei_vocal | tei_kinesic | tei_incident | tei_gap | tei_pb | tei_seg )+ }⚓ |
<unit> contains a symbol, a word or a phrase referring to a unit of measurement in any kind of formal or informal system. [3.6.3. Numbers and Measures] | |
Module | core — Formal specification |
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.analytic (@ana) |
Member of | |
Contained by | core: unit |
May contain | |
Example | The element can be used for fine-grained Named Entities which include units: <num ana="ne:nc"
xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.ne53">
<w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w9"
lemma="3"
msd="UPosTag=NUM|NumForm=Digit|NumType=Card">3</w>
<w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w10"
lemma="miliarda"
msd="UPosTag=NOUN|Case=Gen|Gender=Fem|Number=Sing|Polarity=Pos">miliardy</w>
</num>
<unit ana="ne:om"
xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.ne54">
<w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w11"
lemma="Kč"
msd="UPosTag=NOUN|Gender=Fem|Polarity=Pos" join="right">Kč</w>
</unit> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="name"/> <elementRef key="date"/> <elementRef key="time"/> <elementRef key="num"/> <elementRef key="unit"/> <elementRef key="email"/> <elementRef key="ref"/> <elementRef key="note"/> <elementRef key="gap"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="vocal"/> </alternate> </content> ⚓ |
Schema Declaration | element unit { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, ( tei_w | tei_pc | tei_name | tei_date | tei_time | tei_num | tei_unit | tei_email | tei_ref | tei_note | tei_gap | tei_kinesic | tei_incident | tei_vocal )+ }⚓ |
<vocal> (vocal) marks any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc. [8.3.3. Vocal, Kinesic, Incident] | |||||||
Module | spoken — Formal specification | ||||||
Attributes | att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.ascribed (@who) att.typed (type, @subtype)
| ||||||
Member of | |||||||
Contained by | |||||||
May contain | core: desc | ||||||
Example | <vocal type="interruption">
<desc>Interruption from the chair: Your time is up.</desc>
</vocal> | ||||||
Content model | <content> <elementRef key="desc" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓ | ||||||
Schema Declaration | element vocal { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.ascribed.attribute.who, tei_att.typed.attribute.subtype, attribute type { "greeting" | "question" | "clarification" | "speaking" | "interruption" | "exclamat" | "laughter" | "shouting" | "murmuring" | "noise" | "signal" }?, tei_desc+ }⚓ |
<w> (word) represents a grammatical (not necessarily orthographic) word. [17.1. Linguistic Segment Categories 17.4.2. Lightweight Linguistic Annotation] | |
Module | analysis — Formal specification |
Attributes | att.linguistic (@lemma, @pos, @msd, @join) (att.lexicographic.normalized (@norm)) att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.segLike (@function) |
Member of | |
Contained by | |
May contain | analysis: w character data |
Example | <s xml:id="ParlaMint-GB_2017-10-30-lords.seg4.1">
<w lemma="I"
msd="UPosTag=PRON|Case=Nom|Number=Sing|Person=1|PronType=Prs" pos="PRP">I</w>
<w lemma="support"
msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Fin" pos="VBP">support</w>
<w lemma="the"
msd="UPosTag=DET|Definite=Def|PronType=Art" pos="DT">the</w>
<w lemma="amendment"
msd="UPosTag=NOUN|Number=Sing" pos="NN" join="right">amendment</w>
<pc msd="UPosTag=PUNCT" pos=".">.</pc>
</s> |
Example | Certain frameworks, in particular the Universal Dependencies, allow for tokens to be decomposed into several words, and it is these syntactic words, and not tokens, that are further annotated. For example, Czech has the word ‘abyste’ which is in UD decomposed into two syntactic words, ‘aby’ and ‘byste’, which can be encoded in the <w> element: <w>abyste
<w norm="aby" lemma="aby"
msd="UPosTag=SCONJ"/>
<w norm="byste" lemma="být"
msd="UPosTag=AUX|Mood=Cnd|Number=Plur|Person=2|VerbForm=Fin"/>
</w> |
Content model | <content> <alternate minOccurs="1" maxOccurs="unbounded"> <textNode/> <elementRef key="w"/> </alternate> </content> ⚓ |
Schema Declaration | element w { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.segLike.attribute.function, tei_att.linguistic.attributes, ( text | tei_w )+ }⚓ |
model.addressLike groups elements used to represent a postal or email address. [1. The TEI Infrastructure] | |
Module | tei — Formal specification |
Used by | |
Members | affiliation email |
model.attributable groups elements that contain a word or phrase that can be attributed to a source. [3.3.3. Quotation 4.3.2. Floating Texts] | |
Module | tei — Formal specification |
Used by | |
Members | model.quoteLike |
model.biblLike groups elements containing a bibliographic description. [3.12. Bibliographic Citations and References] | |
Module | tei — Formal specification |
Used by | |
Members | bibl |
model.dateLike groups elements containing temporal expressions. [3.6.4. Dates and Times 13.4. Dates] | |
Module | tei — Formal specification |
Used by | |
Members | date time |
model.divPart groups paragraph-level elements appearing directly within divisions. [1.3. The TEI Class System] | |
Module | tei — Formal specification |
Used by | |
Members | model.divPart.spoken[u] model.lLike model.pLike[p] |
Note | Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items. |
model.divPart.spoken groups elements structurally analogous to paragraphs within spoken texts. [8.1. General Considerations and Overview] | |
Module | spoken — Formal specification |
Used by | |
Members | u |
Note | Spoken texts may be structured in many ways; elements in this class are typically larger units such as turns or utterances. |
model.emphLike groups phrase-level elements which are typographically distinct and to which a specific function can be attributed. [3.3. Highlighting and Quotation] | |
Module | tei — Formal specification |
Used by | |
Members | term title |
model.global groups elements which may appear at any point within a TEI text. [1.3. The TEI Class System] | |
Module | tei — Formal specification |
Used by | |
Members | model.global.edit[gap] model.global.meta[link linkGrp] model.global.spoken[incident kinesic vocal] model.milestoneLike[pb] model.noteLike[note] figure |
model.global.edit groups globally available elements which perform a specifically editorial function. [1.3. The TEI Class System] | |
Module | tei — Formal specification |
Used by | |
Members | gap |
model.global.meta groups globally available elements which describe the status of other elements. [1.3. The TEI Class System] | |
Module | tei — Formal specification |
Used by | |
Members | link linkGrp |
Note | Elements in this class are typically used to hold groups of links or of abstract interpretations, or by provide indications of certainty etc. It may find be convenient to localize all metadata elements, for example to contain them within the same divison as the elements that they relate to; or to locate them all to a division of their own. They may however appear at any point in a TEI text. |
model.global.spoken groups elements which may appear globally within spoken texts. [8.1. General Considerations and Overview] | |
Module | spoken — Formal specification |
Used by | |
Members | incident kinesic vocal |
Note | This class groups elements which can appear anywhere within transcribed speech. |
model.graphicLike groups elements containing images, formulae, and similar objects. [3.10. Graphics and Other Non-textual Components] | |
Module | tei — Formal specification |
Used by | |
Members | graphic media |
model.highlighted groups phrase-level elements which are typographically distinct. [3.3. Highlighting and Quotation] | |
Module | tei — Formal specification |
Used by | |
Members | model.emphLike[term title] model.hiLike |
model.inter groups elements which can appear either within or between paragraph-like elements. [1.3. The TEI Class System] | |
Module | tei — Formal specification |
Used by | |
Members | model.attributable[model.quoteLike] model.biblLike[bibl] model.egLike model.labelLike[desc label] model.listLike[listEvent listOrg listPerson listRelation] model.oddDecl model.stageLike |
model.labelLike groups elements used to gloss or explain other parts of a document. | |
Module | tei — Formal specification |
Used by | |
Members | desc label |
model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources. [1.3. The TEI Class System] | |
Module | tei — Formal specification |
Used by | |
Members | model.emphLike[term title] model.hiLike model.pPart.data[model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]] model.pPart.editorial model.pPart.msdesc model.phrase.xml model.ptrLike[ref] |
model.listLike groups list-like elements. [3.8. Lists] | |
Module | tei — Formal specification |
Used by | |
Members | listEvent listOrg listPerson listRelation |
model.measureLike groups elements which denote a number, a quantity, a measurement, or similar piece of text that conveys some numerical meaning. [3.6.3. Numbers and Measures] | |
Module | tei — Formal specification |
Used by | |
Members | measure num unit |
model.milestoneLike groups milestone-style elements used to represent reference systems. [1.3. The TEI Class System 3.11.3. Milestone Elements] | |
Module | tei — Formal specification |
Used by | |
Members | pb |
model.nameLike groups elements which name or refer to a person, place, or organization. | |
Module | tei — Formal specification |
Used by | |
Members | model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno |
Note | A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc. |
model.nameLike.agent groups elements which contain names of individuals or corporate bodies. [3.6. Names, Numbers, Dates, Abbreviations, and Addresses] | |
Module | tei — Formal specification |
Used by | |
Members | name orgName persName |
Note | This class is used in the content model of elements which reference names of people or organizations. |
model.noteLike groups globally-available note-like elements. [3.9. Notes, Annotation, and Indexing] | |
Module | tei — Formal specification |
Used by | |
Members | note |
model.pLike groups paragraph-like elements. | |
Module | tei — Formal specification |
Used by | |
Members | p |
model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data. [3.6. Names, Numbers, Dates, Abbreviations, and Addresses] | |
Module | tei — Formal specification |
Used by | |
Members | model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno] |
model.pPart.edit groups phrase-level elements for simple editorial correction and transcription. [3.5. Simple Editorial Changes] | |
Module | tei — Formal specification |
Used by | |
Members | model.pPart.editorial model.pPart.transcriptional |
model.persNamePart groups elements which form part of a personal name. [13.2.1. Personal Names] | |
Module | namesdates — Formal specification |
Used by | |
Members | addName forename nameLink roleName surname |
model.phrase groups elements which can occur at the level of individual words or phrases. [1.3. The TEI Class System] | |
Module | tei — Formal specification |
Used by | |
Members | model.graphicLike[graphic media] model.highlighted[model.emphLike[term title] model.hiLike] model.lPart model.pPart.data[model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]] model.pPart.edit[model.pPart.editorial model.pPart.transcriptional] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] model.segLike[pc phr s seg w] model.specDescLike |
Note | This class of elements can occur within paragraphs, list items, lines of verse, etc. |
model.placeNamePart groups elements which form part of a place name. [13.2.3. Place Names] | |
Module | tei — Formal specification |
Used by | |
Members | placeName |
model.placeStateLike groups elements which describe changing states of a place. | |
Module | tei — Formal specification |
Used by | |
Members | model.placeNamePart[placeName] state |
model.ptrLike groups elements used for purposes of location and reference. [3.7. Simple Links and Cross-References] | |
Module | tei — Formal specification |
Used by | |
Members | ref |
model.segLike groups elements used for arbitrary segmentation. [16.3. Blocks, Segments, and Anchors 17.1. Linguistic Segment Categories] | |
Module | tei — Formal specification |
Used by | |
Members | pc phr s seg w |
Note | The principles on which segmentation is carried out, and any special codes or attribute values used, should be defined explicitly in the <segmentation> element of the <encodingDesc> within the associated TEI header. |
att.ascribed provides attributes for elements representing speech or action that can be ascribed to a specific individual. [3.3.3. Quotation 8.3. Elements Unique to Spoken Texts] | |||||||||||
Module | tei — Formal specification | ||||||||||
Members | att.ascribed.directed[kinesic u vocal] change incident setting | ||||||||||
Attributes |
|
att.canonical provides attributes that can be used to associate a representation such as a name or title with canonical information about the object being named or referenced. [13.1.1. Linking Names and Their Referents] | |||||||||||||||||||||||
Module | tei — Formal specification | ||||||||||||||||||||||
Members | att.naming[att.personal[addName forename name orgName persName placeName roleName surname] affiliation birth death education event occupation pubPlace state] catDesc date funder meeting publisher relation resp respStmt term time title | ||||||||||||||||||||||
Attributes |
|
att.datable.custom provides attributes for normalization of elements that contain datable events to a custom dating system (i.e. other than the Gregorian used by W3 and ISO). [13.4. Dates] | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Module | namesdates — Formal specification | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Members | att.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Attributes |
|
att.datable.iso provides attributes for normalization of elements that contain datable events using the ISO 8601:2004 standard. [3.6.4. Dates and Times 13.4. Dates] | |||||||||||||||||||||||||||||||||||
Module | namesdates — Formal specification | ||||||||||||||||||||||||||||||||||
Members | att.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title] | ||||||||||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||||||||||
Note | The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by ISO 8601:2004, using the Gregorian calendar. If both when-iso and dur-iso are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. That is, <date when-iso="2007-06-01" dur-iso="P8D"/> indicates the same time period as <date when-iso="2007-06-01/P8D"/> In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading. |
att.datable.w3c provides attributes for normalization of elements that contain datable events conforming to the W3C XML Schema Part 2: Datatypes Second Edition. [3.6.4. Dates and Times 13.4. Dates] | |||||||||||||||||||||||||||||||||||||
Module | tei — Formal specification | ||||||||||||||||||||||||||||||||||||
Members | att.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title] | ||||||||||||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||||||||||||
Schematron |
<sch:rule context="tei:*[@when]">
<sch:report test="@notBefore|@notAfter|@from|@to"
role="nonfatal">The @when attribute cannot be used with any other att.datable.w3c attributes.</sch:report>
</sch:rule> | ||||||||||||||||||||||||||||||||||||
Schematron |
<sch:rule context="tei:*[@from]">
<sch:report test="@notBefore"
role="nonfatal">The @from and @notBefore attributes cannot be used together.</sch:report>
</sch:rule> | ||||||||||||||||||||||||||||||||||||
Schematron |
<sch:rule context="tei:*[@to]">
<sch:report test="@notAfter"
role="nonfatal">The @to and @notAfter attributes cannot be used together.</sch:report>
</sch:rule> | ||||||||||||||||||||||||||||||||||||
Example | <date from="1863-05-28" to="1863-06-01">28 May through 1 June 1863</date> | ||||||||||||||||||||||||||||||||||||
Note | The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by XML Schema Part 2: Datatypes Second Edition, using the Gregorian calendar. The most commonly-encountered format for the date portion of a temporal attribute is Note that this format does not currently permit use of the value 0000 to represent the year 1 BCE; instead the value -0001 should be used. |
att.datcat provides attributes that are used to align XML elements or attributes with the appropriate Data Categories (DCs) defined by an external taxonomy, in this way establishing the identity of information containers and values, and providing means of interpreting them. [9.5.2. Lexical View 18.3. Other Atomic Feature Values] | |||||||||||||||||||
Module | tei — Formal specification | ||||||||||||||||||
Members | att.segLike[pc phr s seg w] tagUsage | ||||||||||||||||||
Attributes |
| ||||||||||||||||||
Example | The example below presents the TEI encoding of the name-value pair <part of speech, common noun> , where the name (key) ‘part of speech’ is abbreviated as ‘POS’, and the value, ‘common noun’ is symbolized by ‘NN’. The entire name-value pair is encoded by means of the element <f>. In TEI XML, that element acts as the container, labeled with the name attribute. Its contents may be complex or simple. In the case at hand, the content is the symbol ‘NN’.The datcat attribute relates the feature name (i.e., the key) to the data category ‘part of speech’, while the attribute valueDatcat relates the feature value to the data category common noun. Both these data categories should be defined in an external and preferably open reference taxonomy or ontology.<fs>
<f name="POS"
datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">
<symbol valueDatcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"
value="NN"/>
</f>
<!-- ... -->
</fs> ‘NN’ is the symbol for common noun used e.g. in the CLAWS-7 tagset defined by the University Centre for Computer Corpus Research on Language at the University of Lancaster. The very same data category used for tagging an early version of the British National Corpus, and coming from the BNC Basic (C5) tagset, uses the symbol ‘NN0’ (rather than ‘NN’). Making these values semantically interoperable would be extremely difficult without a human expert if they were not anchored in a single point of an established reference taxonomy of morphosyntactic data categories. In the case at hand, the string ‘http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545’ is both a persistent identifier of the data category in question, as well as a pointer to a shared definition of common noun.While the symbols ‘NN’, ‘NN0’, and many others (often coming from languages other than English) are implicitly members of the container category ‘part of speech’, it is sometimes useful not to rely on such an implicit relationship but rather use an explicit identifier for that data category, to distinguish it from other morphosyntactic data categories, such as gender, tense, etc. For that purpose, the above example uses the datcat attribute to reference a definition of part of speech. The reference taxonomy in this example is the CLARIN Concept Registry.If the feature structure markup exemplified above is to be repeated many times in a single document, it is much more efficient to gather the persistent identifiers in a single place and to only reference them, implicitly or directly, from feature structure markup. The following example is much more concise than the one above and relies on the concepts of feature structure declaration and feature value library, discussed in chapter [[undefined FS]]. <fs>
<f name="POS" fVal="#commonNoun"/>
<!-- ... -->
</fs> The assumption here is that the relevant feature values are collected in a place that the annotation document in question has access to — preferably, a single document per linguistic resource, for example an <fsdDecl> that is XIncluded as a sibling of <text> or a child of <encodingDesc>; a <taxonomy> available resource-wide (e.g., in a shared header) is also an option.The example below presents an <fvLib> element that collects the relevant feature values (most of them omitted). At the same time, this example shows one way of encoding a tagset, i.e., an established inventory of values of (in the case at hand) morphosyntactic categories. <fvLib n="POS values">
<symbol xml:id="commonNoun" value="NN"
datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"/>
<symbol xml:id="properNoun" value="NP"
datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/>
<!-- ... -->
</fvLib> Note that these Guidelines do not prescribe a specific choice between datcat and valueDatcat in such cases. The former is the generic way of referencing a data category, whereas the latter is more specific, in that it references a data category that represents a value. The choice between them comes into play where a single element — or a tight element complex, such as the <f>/<symbol> complex illustrated above — make it necessary or useful to distinguish between the container data category and its value. | ||||||||||||||||||
Example | In the context of dictionaries designed with semantic interoperability in mind, the following example ensures that the <pos> element is interpreted as the same information container as in the case of the example of <f name="POS"> above. <gramGrp>
<pos datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"
valueDatcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545">NN</pos>
</gramGrp> Efficiency of this type of interoperable markup demands that the references to the particular data categories should best be provided in a single place within the dictionary (or a single place within the project), rather than being repeated inside every entry. For the container elements, this can be achieved at the level of <tagUsage>, although here, the valueDatcat attribute should be used, because it is not the <tagUsage> element that is associated with the relevant data category, but rather the element <pos> (or <case>, etc.) that is described by <tagUsage>: <tagsDecl partial="true">
<!-- ... -->
<namespace name="http://www.tei-c.org/ns/1.0">
<tagUsage gi="pos"
targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">Contains the part of speech.</tagUsage>
<tagUsage gi="case"
targetDatcat="http://hdl.handle.net/11459/CCR_C-1840_9f4e319c-f233-6c90-9117-7270e215f039">Contains information about the grammatical case that the described form is inflected for.</tagUsage>
<!-- ... -->
</namespace>
</tagsDecl> Another possibility is to shorten the URIs by means of the <prefixDef> mechanism, as illustrated below: <listPrefixDef>
<prefixDef ident="ccr" matchPattern="pos"
replacementPattern="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"/>
<prefixDef ident="ccr" matchPattern="adj"
replacementPattern="http://hdl.handle.net/11459/CCR_C-1230_23653c21-fca1-edf8-fd7c-3df2d6499157"/>
</listPrefixDef>
<!-- ... -->
<entry>
<!--...-->
<form>
<orth>isotope</orth>
</form>
<gramGrp>
<pos datcat="ccr:pos"
valueDatcat="ccr:adj">adj</pos>
</gramGrp>
<!--...-->
</entry> This mechanism creates implications that are not always wanted, among others, in the case at hand, suggesting that the identifiers ‘pos’ and ‘adj’ belong to a namespace associated with the CLARIN Concept Repository (CCR), whereas that is solely a shorthand mechanism whose scope is the current resource. Documenting this clearly in the header of the dictionary is therefore advised.Yet another possibility is to associate the information about the relationship between a TEI markup element and the data category that it is intended to model already at the level of modeling the dictionary resource, that is, at the level of the ODD, in <equiv> element that is a child of <elementSpec> or <attDef>. | ||||||||||||||||||
Example | The targetDatcat attribute is designed to be used in, e.g., feature structure declarations, and is analogous to the targetLang attribute of the att.pointing class, in that it describes the object that is being referenced, rather than the referencing object. <fDecl name="POS"
targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">
<fDescr>part of speech (morphosyntactic category)</fDescr>
<vRange>
<vAlt>
<symbol value="NN"
datcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"/>
<symbol value="NP"
datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/>
<!-- ... -->
</vAlt>
</vRange>
</fDecl> Above, the <fDecl> uses targetDatcat, because if it were to use datcat, it would be asserting that it is an instance of the container data category part of speech, whereas it is not — it models a container (<f>) that encodes a part of speech. Note also that it is the <f> that is modeled above, not its values, which are used as direct references to data categories; hence the use of datcat in the <symbol> element. | ||||||||||||||||||
Note | The TEI Abstract Model can be expressed as a hierarchy of attribute-value matrices (AVMs) of various types and of various levels of complexity, nested or grouped in various ways. At the most abstract level, an AVM consists of an information container and the value (contents) of that container. A simple example of an XML serialization of such structures is, on the one hand, the opening and closing tags that delimit and name the container, and, on the other, the content enclosed by the two tags that constitues the value. An analogous example is an attribute name and the value of that attribute. In a TEI XML example of two equivalent serializations expressing the name-value pair The att.datcat class provides means of addressing the containers and their values, while at the same time providing a way to interpret them in the context of external taxonomies or ontologies. Aligning e.g. both the <pos> element and the pos attribute with the same value of an external reference point (i.e., an entry in an agreed taxonomy) affirms the identity of the concept serialised by both the element container and the attribute container, and optionally provides a definition of that concept (in the case at hand, the concept part of speech). The value of the att.datcat attributes should be a PID (persistent identifier) that points to a specific — and, ideally, shared — taxonomy or ontology. Among the resources that can, to a lesser or greater extent, be used as inventories of (more or less) standardized linguistic categories are the GOLD ontology, CLARIN CCR, OLiA, or TermWeb's DatCatInfo, and also the Universal Dependencies inventory, on the assumption that its URIs are going to persist. It is imaginable that a project may choose to address a local taxonomy store instead, but this risks losing the advantage of interchangeability with other projects. Historically, datcat and valueDatcat originate from the (the now obsolete) ISO 12620:2009 standard, describing the data model and procedures for a Data Category Registry (DCR). The current version of that standard, ISO 12620-1, does not standardize the serialization of pointers, merely mentioning the TEI att.datcat as an example. Note that no constraint prevents the occurrence of a combination of att.datcat attributes: the <fDecl> element, which is a natural bearer of the targetDatcat attribute, is an instance of a specific modeling element, and, in principle, could be semantically fixed by an appropriate reference taxonomy of modeling devices. |
att.declarable provides attributes for those elements in the TEI header which may be independently selected by means of the special purpose decls attribute. [15.3. Associating Contextual Information with a Text] | |||||||||
Module | tei — Formal specification | ||||||||
Members | availability bibl correction editorialDecl equipment equipment hyphenation langUsage listEvent listOrg listPerson normalization particDesc projectDesc quotation recording segmentation settingDesc sourceDesc textClass | ||||||||
Attributes |
| ||||||||
Note | The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 15.3. Associating Contextual Information with a Text. Only one element of a particular type may have a default attribute with a value of true. |
att.duration provides attributes for normalization of elements that contain datable events. | |
Module | spoken — Formal specification |
Members | att.timed[gap incident kinesic media u vocal] date recording time |
Attributes | att.duration.w3c (@dur) att.duration.iso (@dur-iso) |
Note | This ‘superclass’ provides attributes that can be used to provide normalized values of temporal information. By default, the attributes from the att.duration.w3c class are provided. If the module for names & dates is loaded, this class also provides attributes from the att.duration.iso class. In general, the possible values of attributes restricted to the W3C datatypes form a subset of those values available via the ISO 8601 standard. However, the greater expressiveness of the ISO datatypes is rarely needed, and there exists much greater software support for the W3C datatypes. |
att.duration.iso provides attributes for recording normalized temporal durations. [3.6.4. Dates and Times 13.4. Dates] | |||||||
Module | tei — Formal specification | ||||||
Members | att.duration[att.timed[gap incident kinesic media u vocal] date recording time] | ||||||
Attributes |
| ||||||
Note | If both when and dur or dur-iso are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. In order to represent a time range by a duration and its ending time the when-iso attribute must be used. In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading. |
att.duration.w3c provides attributes for recording normalized temporal durations. [3.6.4. Dates and Times 13.4. Dates] | |||||||
Module | tei — Formal specification | ||||||
Members | att.duration[att.timed[gap incident kinesic media u vocal] date recording time] | ||||||
Attributes |
| ||||||
Note | If both when and dur are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. In order to represent a time range by a duration and its ending time the when-iso attribute must be used. In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading. |
att.fragmentable provides attributes for representing fragmentation of a structural element, typically as a consequence of some overlapping hierarchy. | |||||||||||
Module | tei — Formal specification | ||||||||||
Members | att.divLike[div] att.segLike[pc phr s seg w] p | ||||||||||
Attributes |
|
att.global provides attributes common to all elements in the TEI encoding scheme. [1.3.1.1. Global Attributes] | |||||||||||||||||||||||||||||||||||||||||||||
Module | tei — Formal specification | ||||||||||||||||||||||||||||||||||||||||||||
Members | TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w | ||||||||||||||||||||||||||||||||||||||||||||
Attributes | att.global.rendition (@rend, @style, @rendition) att.global.linking (@corresp, @synch, @next, @prev) att.global.analytic (@ana) att.global.responsibility (@resp) att.global.source (@source)
|
att.global.linking provides a set of attributes for hypertextual linking. [16. Linking, Segmentation, and Alignment] | |||||||||||||||||||||||||||||||||
Module | linking — Formal specification | ||||||||||||||||||||||||||||||||
Members | att.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w] | ||||||||||||||||||||||||||||||||
Attributes |
|
att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme. [1.3.1.1.3. Rendition Indicators] | |||||||||||||||||||||||||||||||
Module | tei — Formal specification | ||||||||||||||||||||||||||||||
Members | att.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w] | ||||||||||||||||||||||||||||||
Attributes |
|
att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text, the markup or something asserted by the markup, and the degree of certainty associated with it. [1.3.1.1.4. Sources, certainty, and responsibility 3.5. Simple Editorial Changes 11.3.2.2. Hand, Responsibility, and Certainty Attributes 17.3. Spans and Interpretations 13.1.1. Linking Names and Their Referents] | |||||||||
Module | tei — Formal specification | ||||||||
Members | att.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w] | ||||||||
Attributes |
| ||||||||
Example | Blessed are the
<choice>
<sic>cheesemakers</sic>
<corr resp="#editor" cert="high">peacemakers</corr>
</choice>: for they shall be called the children of God. | ||||||||
Example |
<!-- in the <text> ... --><lg>
<!-- ... -->
<l>Punkes, Panders, baſe extortionizing
sla<choice>
<sic>n</sic>
<corr resp="#JENS1_transcriber">u</corr>
</choice>es,</l>
<!-- ... -->
</lg>
<!-- in the <teiHeader> ... -->
<!-- ... -->
<respStmt xml:id="JENS1_transcriber">
<resp when="2014">Transcriber</resp>
<name>Janelle Jenstad</name>
</respStmt> |
att.global.source provides attributes used by elements to point to an external source. [1.3.1.1.4. Sources, certainty, and responsibility 3.3.3. Quotation 8.3.4. Writing] | |||||||||||
Module | tei — Formal specification | ||||||||||
Members | att.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w] | ||||||||||
Attributes |
| ||||||||||
Example | <p>
<!-- ... --> As Willard McCarty (<bibl xml:id="mcc_2012">2012, p.2</bibl>) tells us, <quote source="#mcc_2012">‘Collaboration’ is a problematic and should be a contested
term.</quote>
<!-- ... -->
</p> | ||||||||||
Example | <p>
<!-- ... -->
<quote source="#chicago_15_ed">Grammatical theories are in flux, and the more we learn, the
less we seem to know.</quote>
<!-- ... -->
</p>
<!-- ... -->
<bibl xml:id="chicago_15_ed">
<title level="m">The Chicago Manual of Style</title>,
<edition>15th edition</edition>. <pubPlace>Chicago</pubPlace>: <publisher>University of
Chicago Press</publisher> (<date>2003</date>), <biblScope unit="page">p.147</biblScope>.
</bibl> | ||||||||||
Example | <elementRef key="p" source="tei:2.0.1"/> Include in the schema an element named <p> available from the TEI P5 2.0.1 release. | ||||||||||
Example | <schemaSpec ident="myODD"
source="mycompiledODD.xml">
<!-- further declarations specifying the components required -->
</schemaSpec> Create a schema using components taken from the file mycompiledODD.xml. |
att.internetMedia provides attributes for specifying the type of a computer resource using a standard taxonomy. | |||||||
Module | tei — Formal specification | ||||||
Members | att.media[graphic media] ref | ||||||
Attributes |
| ||||||
Example | In this example mimeType is used to indicate that the URL points to a TEI XML file encoded in UTF-8. <ref mimeType="application/tei+xml; charset=UTF-8"
target="https://raw.githubusercontent.com/TEIC/TEI/dev/P5/Source/guidelines-en.xml"/> | ||||||
Note | This attribute class provides an attribute for describing a computer resource, typically available over the internet, using a value taken from a standard taxonomy. At present only a single taxonomy is supported, the Multipurpose Internet Mail Extensions (MIME) Media Type system. This typology of media types is defined by the Internet Engineering Task Force in RFC 2046. The list of types is maintained by the Internet Assigned Numbers Authority (IANA). The mimeType attribute must have a value taken from this list. |
att.lexicographic.normalized provides attributes for usage within word-level elements in the analysis module and within lexicographic microstructure in the dictionaries module. | |||||||||||||||||||
Module | analysis — Formal specification | ||||||||||||||||||
Members | att.linguistic[pc w] | ||||||||||||||||||
Attributes |
| ||||||||||||||||||
Note | It needs to be stressed that the two attributes in this class are meant for strictly lexicographic and linguistic uses, and not for editorial interventions. For the latter, the mechanism based on <choice>, <orig>, and <reg> needs to be employed. |
att.linguistic provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically <w> and <pc> in the analysis module. [17.4.2. Lightweight Linguistic Annotation] | |||||||||||||||||||||||||||||||||||||||||||||
Module | analysis — Formal specification | ||||||||||||||||||||||||||||||||||||||||||||
Members | pc w | ||||||||||||||||||||||||||||||||||||||||||||
Attributes | att.lexicographic.normalized (@norm)
| ||||||||||||||||||||||||||||||||||||||||||||
Note | These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section 17.4.2. Lightweight Linguistic Annotation for discussion. |
att.naming provides attributes common to elements which refer to named persons, places, organizations etc. [3.6.1. Referring Strings 13.3.6. Names and Nyms] | |||||||
Module | tei — Formal specification | ||||||
Members | att.personal[addName forename name orgName persName placeName roleName surname] affiliation birth death education event occupation pubPlace state | ||||||
Attributes | att.canonical (@key, @ref)
|
att.pointing provides a set of attributes used by all elements which point to other elements by means of one or more URI references. [1.3.1.1.2. Language Indicators 3.7. Simple Links and Cross-References] | |||||||||
Module | tei — Formal specification | ||||||||
Members | att.pointing.group[linkGrp] catRef licence link note ref term | ||||||||
Attributes |
|
att.ranging provides attributes for describing numerical ranges. | |||||||||||||||||||||||||||||||
Module | tei — Formal specification | ||||||||||||||||||||||||||||||
Members | att.dimensions[birth date death gap state time] measure num | ||||||||||||||||||||||||||||||
Attributes |
| ||||||||||||||||||||||||||||||
Example | The MS. was lost in transmission by mail from <del rend="overstrike">
<gap reason="illegible"
extent="one or two letters" atLeast="1" atMost="2" unit="chars"/>
</del> Philadelphia to the Graphic office, New York.
| ||||||||||||||||||||||||||||||
Example | Americares has been supporting the health sector in Eastern
Europe since 1986, and since 1992 has provided <measure atLeast="120000000" unit="USD"
commodity="currency">more than
$120m</measure> in aid to Ukrainians.
|
att.resourced provides attributes by which a resource (such as an externally held media file) may be located. | |||||||
Module | tei — Formal specification | ||||||
Members | graphic media | ||||||
Attributes |
|
att.typed provides attributes that can be used to classify or subclassify elements in any way. [1.3.1. Attribute Classes 17.1.1. Words and Above 3.6.1. Referring Strings 3.7. Simple Links and Cross-References 3.6.5. Abbreviations and Their Expansions 3.13.1. Core Tags for Verse 7.2.5. Speech Contents 4.1.1. Un-numbered Divisions 4.1.2. Numbered Divisions 4.2.1. Headings and Trailers 4.4. Virtual Divisions 13.3.2.3. Personal Relationships 11.3.1.1. Core Elements for Transcriptional Work 16.1.1. Pointers and Links 16.3. Blocks, Segments, and Anchors 12.2. Linking the Apparatus to the Text 22.5.1.2. Defining Content Models: RELAX NG 8.3. Elements Unique to Spoken Texts 23.3.1.3. Modification of Attribute and Attribute Value Lists] | |||||||||||||||||||
Module | tei — Formal specification | ||||||||||||||||||
Members | att.pointing.group[linkGrp] TEI addName affiliation application bibl birth change date death desc div education event figure forename graphic head idno incident kinesic label link listEvent listOrg listPerson listRelation measure media name nameLink note num occupation org orgName pb pc persName phr placeName recording ref relation roleName s seg sex state surname teiCorpus term text time title unit vocal w | ||||||||||||||||||
Attributes |
| ||||||||||||||||||
Schematron |
<sch:rule context="tei:*[@subtype]">
<sch:assert test="@type">The <sch:name/> element should not be categorized in detail with @subtype unless also categorized in general with @type</sch:assert>
</sch:rule> | ||||||||||||||||||
Note | When appropriate, values from an established typology should be used. Alternatively a typology may be defined in the associated TEI header. If values are to be taken from a project-specific list, this should be defined using the <valList> element in the project-specific schema description, as described in 23.3.1.3. Modification of Attribute and Attribute Value Lists . |
teidata.certainty defines the range of attribute values expressing a degree of certainty. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <valList type="closed"> <valItem ident="high"/> <valItem ident="medium"/> <valItem ident="low"/> <valItem ident="unknown"/> </valList> </content> ⚓ |
Declaration | tei_teidata.certainty = "high" | "medium" | "low" | "unknown"⚓ |
Note | Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter. |
teidata.count defines the range of attribute values used for a non-negative integer value used as a count. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <dataRef name="nonNegativeInteger"/> </content> ⚓ |
Declaration | tei_teidata.count = xsd:nonNegativeInteger⚓ |
Note | Any positive integer value or zero is permitted |
teidata.duration.iso defines the range of attribute values available for representation of a duration in time using ISO 8601 standard formats | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef name="token" restriction="[0-9.,DHMPRSTWYZ/:+\-]+"/> </content> ⚓ |
Declaration | tei_teidata.duration.iso = token { pattern = "[0-9.,DHMPRSTWYZ/:+\-]+" }⚓ |
Example | <time dur-iso="PT0,75H">three-quarters of an hour</time> |
Example | <date dur-iso="P1,5D">a day and a half</date> |
Example | <date dur-iso="P14D">a fortnight</date> |
Example | <time dur-iso="PT0.02S">20 ms</time> |
Note | A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the last, which may have a decimal component (using either For complete details, see ISO 8601 Data elements and interchange formats — Information interchange — Representation of dates and times. |
teidata.duration.w3c defines the range of attribute values available for representation of a duration in time using W3C datatypes. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef name="duration"/> </content> ⚓ |
Declaration | tei_teidata.duration.w3c = xsd:duration⚓ |
Example | <time dur="PT45M">forty-five minutes</time> |
Example | <date dur="P1DT12H">a day and a half</date> |
Example | <date dur="P7D">a week</date> |
Example | <time dur="PT0.02S">20 ms</time> |
Note | A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the For complete details, see the W3C specification. |
teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef key="teidata.word"/> </content> ⚓ |
Declaration | tei_teidata.enumerated = teidata.word⚓ |
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element. |
teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system. [6.1. Language Identification] | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <alternate> <dataRef name="language"/> <valList> <valItem ident=""/> </valList> </alternate> </content> ⚓ |
Declaration | tei_teidata.language = xsd:language | ( "" )⚓ |
Note | The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice. A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.
There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications. Second, an entire language tag can consist of only a private use subtag. These tags start with Examples include
The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML. |
teidata.name defines the range of attribute values expressed as an XML Name. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <dataRef name="Name"/> </content> ⚓ |
Declaration | tei_teidata.name = xsd:Name⚓ |
Note | Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see https://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits. |
teidata.numeric defines the range of attribute values used for numeric values. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <alternate> <dataRef name="double"/> <dataRef name="token" restriction="(\-?[\d]+/\-?[\d]+)"/> <dataRef name="decimal"/> </alternate> </content> ⚓ |
Declaration | tei_teidata.numeric = xsd:double | token { pattern = "(\-?[\d]+/\-?[\d]+)" } | xsd:decimal⚓ |
Note | Any numeric value, represented as a decimal number, in floating point format, or as a ratio. To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3. A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2. |
teidata.outputMeasurement defines a range of values for use in specifying the size of an object that is intended for display. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef name="token" restriction="[\-+]?\d+(\.\d+)?(%|cm|mm|in|pt|pc|px|em|ex|ch|rem|vw|vh|vmin|vmax)"/> </content> ⚓ |
Declaration | tei_teidata.outputMeasurement = token { pattern = "[\-+]?\d+(\.\d+)?(%|cm|mm|in|pt|pc|px|em|ex|ch|rem|vw|vh|vmin|vmax)" }⚓ |
Example | <figure>
<head>The TEI Logo</head>
<figDesc>Stylized yellow angle brackets with the letters <mentioned>TEI</mentioned> in
between and <mentioned>text encoding initiative</mentioned> underneath, all on a white
background.</figDesc>
<graphic height="600px" width="600px"
url="http://www.tei-c.org/logos/TEI-600.jpg"/>
</figure> |
Note | These values map directly onto the values used by XSL-FO and CSS. For definitions of the units see those specifications; at the time of this writing the most complete list is in the CSS3 working draft. |
teidata.pattern defines attribute values which are expressed as a regular expression. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <dataRef name="token"/> </content> ⚓ |
Declaration | tei_teidata.pattern = token⚓ |
Note | A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern WikipediaH(ä|ae?)ndel (or alternatively, it is said that the pattern H(ä|ae?)ndel matches each of the three strings)This TEI datatype is mapped to the XSD token datatype, and may therefore contain any string of characters. However, it is recommended that the value used conform to the particular flavour of regular expression syntax supported by XSD Schema. |
teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef restriction="\S+" name="anyURI"/> </content> ⚓ |
Declaration | tei_teidata.pointer = xsd:anyURI { pattern = "\S+" }⚓ |
Note | The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, |
teidata.prefix defines a range of values that may function as a URI scheme name. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <dataRef name="token" restriction="[a-z][a-z0-9\+\.\-]*"/> </content> ⚓ |
Declaration | tei_teidata.prefix = token { pattern = "[a-z][a-z0-9\+\.\-]*" }⚓ |
Note | This datatype is used to constrain a string of characters to one that can be used as a URI scheme name according to RFC 3986, section 3.1. Thus only the 26 lowercase letters a–z, the 10 digits 0–9, the plus sign, the period, and the hyphen are permitted, and the value must start with a letter. |
teidata.probCert defines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <alternate> <dataRef key="teidata.probability"/> <dataRef key="teidata.certainty"/> </alternate> </content> ⚓ |
Declaration | tei_teidata.probCert = teidata.probability | teidata.certainty⚓ |
teidata.probability defines the range of attribute values expressing a probability. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef name="double"/> </content> ⚓ |
Declaration | tei_teidata.probability = xsd:double⚓ |
Note | Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true. |
teidata.replacement defines attribute values which contain a replacement template. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <textNode/> </content> ⚓ |
Declaration | tei_teidata.replacement = text⚓ |
teidata.temporal.iso defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the international standard Data elements and interchange formats – Information interchange – Representation of dates and times. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <alternate> <dataRef name="date"/> <dataRef name="gYear"/> <dataRef name="gMonth"/> <dataRef name="gDay"/> <dataRef name="gYearMonth"/> <dataRef name="gMonthDay"/> <dataRef name="time"/> <dataRef name="dateTime"/> <dataRef name="token" restriction="[0-9.,DHMPRSTWYZ/:+\-]+"/> </alternate> </content> ⚓ |
Declaration | tei_teidata.temporal.iso = xsd:date | xsd:gYear | xsd:gMonth | xsd:gDay | xsd:gYearMonth | xsd:gMonthDay | xsd:time | xsd:dateTime | token { pattern = "[0-9.,DHMPRSTWYZ/:+\-]+" }⚓ |
Note | If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used. For all representations for which ISO 8601:2004 describes both a basic and an extended format, these Guidelines recommend use of the extended format. |
teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <alternate> <dataRef name="date"/> <dataRef name="gYear"/> <dataRef name="gMonth"/> <dataRef name="gDay"/> <dataRef name="gYearMonth"/> <dataRef name="gMonthDay"/> <dataRef name="time"/> <dataRef name="dateTime"/> </alternate> </content> ⚓ |
Declaration | tei_teidata.temporal.w3c = xsd:date | xsd:gYear | xsd:gMonth | xsd:gDay | xsd:gYearMonth | xsd:gMonthDay | xsd:time | xsd:dateTime⚓ |
Note | If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used. |
teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <dataRef name="string"/> </content> ⚓ |
Declaration | tei_teidata.text = string⚓ |
Note | Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted. |
teidata.truthValue defines the range of attribute values used to express a truth value. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef name="boolean"/> </content> ⚓ |
Declaration | tei_teidata.truthValue = xsd:boolean⚓ |
Note | The possible values of this datatype are 1 or true, or 0 or false. This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: teidata.xTruthValue. |
teidata.versionNumber defines the range of attribute values used for version numbers. | |
Module | tei — Formal specification |
Used by | Element:
|
Content model | <content> <dataRef name="token" restriction="[\d]+[a-z]*[\d]*(\.[\d]+[a-z]*[\d]*){0,3}"/> </content> ⚓ |
Declaration | tei_teidata.versionNumber = token { pattern = "[\d]+[a-z]*[\d]*(\.[\d]+[a-z]*[\d]*){0,3}" }⚓ |
teidata.word defines the range of attribute values expressed as a single word or token. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <dataRef name="token" restriction="[^\p{C}\p{Z}]+"/> </content> ⚓ |
Declaration | tei_teidata.word = token { pattern = "[^\p{C}\p{Z}]+" }⚓ |
Note | Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. |
teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <alternate> <dataRef name="boolean"/> <valList> <valItem ident="unknown"/> <valItem ident="inapplicable"/> </valList> </alternate> </content> ⚓ |
Declaration | tei_teidata.xTruthValue = xsd:boolean | ( "unknown" | "inapplicable" )⚓ |
Note | In cases where where uncertainty is inappropriate, use the datatype teidata.TruthValue. |
teidata.xpath defines attribute values which contain an XPath expression. | |
Module | tei — Formal specification |
Used by | |
Content model | <content> <textNode/> </content> ⚓ |
Declaration | tei_teidata.xpath = text⚓ |
Note | Any XPath expression using the syntax defined in 6.2.. When writing programs that evaluate XPath expressions, programmers should be mindful of the possibility of malicious code injection attacks. For further information about XPath injection attacks, see the article at OWASP. |