1. Introduction

This document is meant to serve as a reference for the encoding of ParlaMint corpora of parliamentary proceedings. In order for the ParlaMint corpora to be interoperable (i.e. so that the same scripts can be used to process them), their structure is fairly rigid, both in terms of file names and folder structure, as well as their TEI XML encoding. This is not to say that all the corpora have to contain exactly the same information because we distinguish obligatory information, which all the corpora should contain, from that which is optional, and present only in the corpora for which it has been possible to gather it from the corpus sources.

This document is a specialisation of Parla-CLARIN, itself a customisation the TEI Guidelines. But while Parla-CLARIN gives fairly general recommendations for encoding corpora of parliamentary proceedings, ParlaMint, as mentioned, is much stricter. This document gives very specific encoding recommendations without necessarily stating the reasons for their choice. It covers the overall structure of ParlaMint corpora, the metadata they contain, the encoding of transcriptions, and, for the linguistically annotated version, the encoding of word-level linguistic annotatios, syntactic dependencies and named entities.

The document is not meant as a tutorial on TEI or ParlaMint, but as a reference to elements, their nesting and attributes. Other sources can help in understanding the encoding and content of ParlaMint corpora:

Samples of ParlaMint corpora, available in the Samples/ directory of the ParlaMint GitHub repository; note that the samples in the main branch are supposed to be publication-ready, while those in the data branch are work in progress.
Two openly available papers detailing the results of the ParlaMint I and ParlaMint II projects:
- The ParlaMint corpora of parliamentary proceedings. Language Resources & Evaluation (2022). DOI 10.1007/s10579-021-09574-0.
- ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe. Language Resources & Evaluation (2024). DOI 10.1007/s10579-024-09798-w.
The Parla-CLARIN guidelines, which provide general guidelines for encoding parliamentary corpora in TEI; they also give links to the relevant chapters of the TEI Guidelines.

The rest of these recommendations are structured as follows:

Chapter 2 explains the overall XML structure of a ParlaMint corpus, and introduces the distinction between the corpus root and corpus components;
Chapter 3 explains some general requirements and the file-naming conventions a ParlaMint corpus has to meet; it also introduces the top level elements and their attributes and the main pointing attributes;
Chapter 4 concentrates on the stucture and encoding of the corpus metadata, such as the title information, documenting the source of the corpus, taxonomies used etc.;
Chapter 5 explains how and what information must be encoded about the persons giving the speeches and the (political) organisations they belong to;
Chapter 6 treats the encoding of the transcripts, including speeches and transcriber notes;
Chapter 7 details the addition of linguistic annotations to the corpus;
Chapter 8 introduces scripts to finalise, validate and convert a ParlaMint corpus to other formats;
Chapter 9 gives instructions on how to contribute samples of a ParlaMint corpus to GitHub;
Appendix A gives the formal specification of the Parla-CLARIN schema.

2. Overall corpus structure

2.1. XML structure

The parliamentary proceeding of one country of autonomous region constitute one ParlaMint corpus, which is stored as one XML document, with <teiCorpus> as its top-level element. It is composed of a <teiHeader>, giving the metadata for the corpus as a whole (further detailed in the Section on Corpus metadata), followed by a series of <TEI> elements that each contain one corpus component, as illustrated¹ below:

<teiCorpus xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader>...</teiHeader> <TEI>...</TEI>  <TEI>...</TEI>  ...  </teiCorpus>

Each corpus component should contain at most the transcripts for one day, although several components can contain the transcript for the same day, e.g. for different (types of) meetings. How and if these further subdivisions into separate components are realised is dependent on the corpus, as the granularity of parliamentary proceedings corpora, not to mention the national rules of structuring the workings of the parliament, differ substantially.

A corpus component will thus be rooted in the <TEI> element, which then contains its metadata in its own <teiHeader>, followed by the <text> element, which contains the transcription of the particular component, as illustrated below:

The <teiHeader> of a corpus component (further detailed in the Section on Corpus metadata) contains the metadata specific for this component (along with some redundant metadata about the provenance), and which should be unique in the corpus, i.e. the corpus component metadata should distinguish it from all the other components of the corpus.

2.2. Use of XInclude

The fact that a corpus is one XML document does not mean that it is also stored in one file. In fact, ParlaMint requires that each corpus component is stored in a separate file, with the corpus root, i.e. the top-level <teiCorpus>, also stored as one file. Furthermore, some parts of the corpus root metadata are also stored in separate files.

To enable one XML document to be composed of many files, we use the XInclude mechanism, and the corpus root uses this mechanism (i.e. the <include> elements in the XInclude namespace) to include its corpus component files, so a corpus root will be in fact encoded similarly to the following example:

<teiCorpus xmlns="http://www.tei-c.org/ns/1.0" > <teiHeader>...</teiHeader> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2014/ParlaMint-NL_2014-04-16.xml"/>  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2014/ParlaMint-NL_2014-04-17.xml"/>  ...  </teiCorpus>

Apart from corpus components, some parts of the overall corpus metadata (i.e. the <teiCorpus> <teiHeader> element) are also stored as separate files, and hence also included in the corpus root using the same XInclude mechanism as explained above.

2.3. File names and directory structure

ParlaMint has strict rules on how to name the various files that constitute a corpus, and how to collect them in directories.

The file names have the the following structure:

The corpus root file name should start with the string ParlaMint-, followed by the ISO 3166 country (or automous region) code (cf. Section on Standard values) e.g. ParlaMint-NL.xml or ParlaMint-ES-CT.
For machine-translated corpora the ISO 639 code of the language (cf. Section on Standard values) should follow the country code, e.g. ParlaMint-NL-en.xml.
A corpus component filename should start with the name of the root, followed by an underscore and the ISO 8601 formatted date of the transcript, for example ParlaMint-IS_2015-01-21-54.xml. In case a corpus component is further distinguished, so that there are are several components with the same date, the corpus compilers are free to extend the file name by a hyphen and any suffix containing only ASCII letters and numbers and the hyphen character, e.g. ParlaMint-NL_2018-10-30-eerstekamer-4.xml or ParlaMint-CZ_2016-04-13-ps2013-044-02-016-098.xml
Certain metadata elements from the corpus root <teiHeader> are stored in separate files, in particular the list of speakers, <listPerson>, the list of political parties and other organisations, <listOrg>, and the ParlaMint structural and linguistic taxonomies, i.e. <taxonomy> elements. The file names for such metadata files start with the name of the corpus root, followed by a hyphen, and then the name of the element, e.g. ParlaMint-BE-listPerson.xml. Where there are more files for instances of the same element name, as is the case for taxonomies, the filename should end with another hypen, followed by the ID of the particular element, e.g. ParlaMint-BE-taxonomy-UD-SYN.xml. Finally, some of the taxonomies are not corpus-specific, i.e. identical files are used by all ParlaMint corpora. In this case, the country or region code is ommitted, e.g. ParlaMint-taxonomy-parla.legislature.xml.
The file names of the corpus as a whole or corpus components that have been automatically converted from the source XML into some other format should have the same name as the corpus root or components, respectively, but with appropriate file extensions, e.g, ParlaMint-IS_2015-01-21-54.txt; this is further explained in the Section on Conversions.
As discussed in the Chapter on Linguistic annotation we distinguish the linguistically annotated version of the corpus from the ‘plain-text’ one, with the linguistic annotated version having the additional suffix .ana on the corpus root and components, e.g. ParlaMint-ES-CT.ana.xml or ParlaMint-IS_2015-01-21-54.ana.xml.

For distribution the complete XML corpus should be stored in a directory that has the same name prefix as the corpus root file. The directory then contains the corpus root file and its metadata files, while the corpus components should be in subdirectories, one per year, for example:

 ParlaMint-BE.TEI/ParlaMint-BE.xml
 ParlaMint-BE.TEI/ParlaMint-BE-listPerson.xml
 ParlaMint-BE.TEI/ParlaMint-BE-listOrg.xml
 ParlaMint-BE.TEI/ParlaMint-taxonomy-parla.legislature.xml
 ParlaMint-BE.TEI/ParlaMint-taxonomy-speaker_types.xml
 ...
 ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-06-19.xml
 ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-06-30.xml
 ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-07-17.xml
 ...
 ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-06-54.xml
 ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-07-54.xml
 ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-08-54.xml
 ...⚓

The lingistically annotated version of the corpus is stored separately, with the main directory and, as mentioned, the corpus root and component filenames having the additional suffix .ana, e.g.

 ParlaMint-BE.TEI.ana/ParlaMint-BE.ana.xml
 ParlaMint-BE.TEI.ana/ParlaMint-BE-listPerson.xml
 ParlaMint-BE.TEI.ana/ParlaMint-BE-listOrg.xml
 ParlaMint-BE.TEI.ana/ParlaMint-taxonomy-parla.legislature.xml
 ParlaMint-BE.TEI.ana/ParlaMint-taxonomy-speaker_types.xml
 ParlaMint-taxonomy-NER.xml
 ParlaMint-taxonomy-UD.xml
 ...
 ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-06-19.ana.xml
 ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-06-30.ana.xml
 ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-07-17.ana.xml
 ...
 ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-06-54.ana.xml
 ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-07-54.ana.xml
 ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-08-54.ana.xml
 ...⚓

3. General requirements

This section gives some general requirements a ParlaMint corpus has to meet, in particular those relating to the characters in a corpus, and the use of standards. It also details the structure of the file names of the ParlaMint root and component files, as well as the attributes expected on the <teiCorpus> and <TEI> tags.

3.1. Characters

The corpus should be encoded in Unicode, using the UTF-8 character encoding, at least for European languages. In cases where the original contains characters from the Unicode Private Use Area, these should, if possible, be given their closest Unicode equivalents or substituted by the Unicode replacement character U+FFFD. End-of-line hyphens, if present in the source files, should be removed, and the split words joined in order to enhance searching the corpus and to simplify linguistic processing.

The following characters, esp. prevalent when the source documents were in Word or HTML, deserve special mention:

TAB (U+0009) character helps the alignment of strings on successive lines. As ParlaMint is not interested in preserving the layout, all TAB chacters are substituted by space characters (U+0020).
NO-BREAK SPACE (U+00A0) prevents, with some applications, an automatic line break at its position and also collapsing such consecutive characters into a single space. As the use of this character complicates (or breaks) further processing, esp. linguistic annotation, these characters should be substituted by the normal space character (U+0020). The same holds for other variants of spaces (U+2000 - U+200A), which are, however, used much less frequently.
NON-BREAKING HYPHEN (U+2011), similarly to NO-BREAK SPACE, prevents a line break, in this case following its position. With a similar reasoning as above, this character should be substituted by the normal hyphen character ('-', U+002D).
SOFT HYPHEN (U+00AD) indicates that a word can be hyphenated at that point. Occurrences of this character should be removed from the corpus.

Text-bearing elements should also not start or end with space characters, and sequences of whitespace characters should be changed into a single space.

3.2. Standard values

Whenever possible, ParlaMint uses standards for information coding. In particular, the following information must be standardised:

As the identity of a ParlaMint corpus is determined by the country or region of the particular parliament, its code appears in many places. For specifying these codes, the ISO 3166 standard should be used, in particular ISO 3166-1 alpha-2 for the two letter codes of the countries (for national parliaments) and ISO 3166-2 for the names of country subdivision (for parliaments of autonomous provinces,). So, for example, the country code for Spain is "ES", while the code for the autonomous Basque community is "ES-PV". Note that we use the term regional parliaments for such cases.
The codes for the languages used in the corpora (i.e. the possible values of the xml:lang attribute) should follow BCP 47 (cf. also ‘xml:lang in XML document schemas’. Essentially, this means that the value for a language code should have two letters, following ISO 639-1 or, and only if a two letter code does not exist for a language, the three-letter ISO 639-2/T code. For example, the code for Basque is 'eu'. ParlaMint corpora will use at least two languages, i.e. the language that the transcriptions are written in, which we will call the local language and English, as the meta-language, which is (also) used in the metadata.
Temporal, i.e. time-related information is typically stored in the when, from and to attributes of various elements. To specify a date or time as the value of these attributes, formatting according to the ISO 8601 standard should be used, e.g. 2022-04-01 for the 1st of April 2022. More information on temporal attributes is given in the Section on Temporal attributes.

3.3. Attributes of top-level elements

The Chapter on Overall corpus structure introduced the top level elements of the corpus root file and of the component files (i.e. the <teiCorpus> and <TEI> elements), but did not elaborate on their attributes; these are presented in this section.

The corpus root has three required attributes, as shown below:

All three attributes can also be used on any other element, and are thus of special importance:

xmlns determines the namespace of the element, and this should always be the TEI namespace, i.e. http://www.tei-c.org/ns/1.0. Note that all lower level elements in the same file inherit this namespace, so it is not necessary (although it is not an error) for other elements to also define their namespace.
xml:id is an attribute form the (implicitly assumed) XML namespace, and gives the identifier for the corpus root or component. The value of an ID should be unique in the corpus as a whole and should obey format requirements as defined by W3C. For the corpus root, as well as for the components, it is required that this top level identifier is identical to the file name (without the file extension). The xml:id is a global attribute, so any element can have it. While this is not required, it is necessary for any element that is then referred to (via this same ID) by some other element, such as many elements in the <teiHeader>, as is explained in the Section on Corpus metadata. The subordinate elements in the transcription that have an ID (such as utterances and segments), are recommended to have the top level xml:id as a prefix and to indicate the element name in the ID. For example, if the top level ID is ParlaMint-GB_2021-01-06, the first utterance would have the ID ParlaMint-GB_2021-01-06-lords.u1 and the first segment ParlaMint-GB_2021-01-06-lords.seg1. The number of the element should not have leading zeros.
xml:lang is also a global attribute and gives the language code of the text content of the element; for the corpus root this does not (just) mean the content of its TEI header, but primarily the textual content of its XIncluded components. The convention is that language of the text content of an element is determined by the value of the first xml:lang attribute on its ancestor axis. In cases where the content is multilingual, the language code should be of the majority language. When the proportion of the languages is about equal, then the mul code for multiple languages can also be used.

A corpus component also has the same three required attributes, but additionally also the ana attribute:

The same as for the corpus root, the component also sets the TEI namespace, and gives the language of its textual content, while its xml:id, of course, identifies the particular component. The ana attribute is a pointing attribute, and we introduce the these attributes in the next section.

3.4. Pointing attributes

The ParlaMint encoding uses pointing attributes for a number of purposes, e.g. for references to taxonomy categories, to speaker metadata, or to linguistic categories.

While a few elements have dedicated pointing attributes, there are three generally used ones. They share the characteristics that they are all used by a large number of different elements and that their value is a series of pointers, i.e. a white-space delimited sequence of references to the values of some xml:id attribute in the corpus or, in general, to an URI. The three attributes are:

ana serves to provide an analysis or to classify an element according to some pre-determined vocabulary. In ParlaMint the target element will typically be a category in a taxonomy, an event or date, or an organisation.
corresp points to items that correspond to the current element in some way, e.g. the (URL of a) media file to a page break.
ref provides an explicit reference to the full definition or identity for the entity being named. In ParlaMint it is used e.g. for connecting a person's affiliation with a particular organisation. The value of this attribute is often, but not always, an URL, e.g. for associating a place name with its GeoNames URL.

To illustrate, the example below gives some elements that contain one or more of these attributes:

<meeting ana="#parla.upper #parla.term #LEG.18">18 Legislatura</meeting> ... <affiliation ref="#group.L-SP-PSd.Az" role="member" ana="#LEG.18" from="2018-03-27"/> ... <placeName ref="https://www.geonames.org/2523918">Palermo</placeName> ... <link ana="ud-syn:det" target="#ParlaMint-IT.seg1.2.6 #ParlaMint-IT.seg1.2.5"/>

The first example, with the <meeting> element classifies it (the definitions are given in the relevant taxonomy) as a meeting of the upper house, in the scope of a parlimentary term, specifically in the XVIII Legislative Term. The example with <affiliation> (again, the definitions are given the elements with the pointed-to ID) specifies that the (person that has this) affiliation is a member of the parliamentary group ‘Lega-Salvini Premier-Partito Sardo d'Azione’ in the scope of the XVIII Legislative Term. The <placeName> example gives the definition of Palermo in the GeoNames database via the used URL. Finally, the <link> example illustrates a Universal Dependencies determiner syntactic link between two tokens. The link uses the TEI extended pointer syntax, further explained in the Section on Prefix definitions.

It is often difficult to decide which of the attribute to use for a particular pointer, therefore examples of usage given with the relevant element should be always consulted.

3.5. Temporal attributes

ParlaMint makes a lot of use of temporal information, e.g. to determine when a session took place or the period when a certain person was an MP. As mentioned in the Section on Standard values, the ISO 8601 format should be used to specify the dates or times.

The following attributes are used to specify temporal information:

The when attribute is used when the temporal information refers to a point in time, typically a date, and is used e.g. to give the date when the corpus was published, or when a change in the corpus was made.
The from and to attributes give the starting and ending date or time of an interval, e.g. the time period the corpus covers, or the period when a person was an MP. If only one of the two attributes is present, then the assumption is that this interval extends at least to the start (if from is missing) or after the end (if to is missing) of time period that the particular ParlaMint corpus covers. Similary, if both attributes are missing, the assumption is that the interval covers the complete time period of the ParlaMint corpus.

It should be noted that, in ParlaMint, we do not support overlapping dates. For example, it is common for a term to end on a particular day, with the next term starting on the same day, and the same for a coalition and opposition. But encoding this overlap leads to problems of multivalued attributes (e.g. for a speaker to belong to a coalition and opposition on the same day), so the starting date of the next event or state should be, by convention, moved one day forward. ParlaMint thus introduces a small mistake in recording the facts but this is outweighted by the simplification in processing.

4. Corpus metadata

As mentioned, <teiCorpus> and <TEI> elements contain the obligatory <teiHeader> element, which stores the metadata to the corpus root or component. In this section we explain and give examples of the required and optional metadata that is contained in the <teiHeader>, proceeding through its various elements, and there distinguishing which parts and what content is appropriate for the corpus root, and which for a corpus component.

As a general remark, most metadata contains free text, and it is a requirement of ParlaMint that this data is given in the English language, to help researchers for other countries to understand it, and it is recommended to also give it in the local language in which the (main portion of) parliamentary transcripts is written, for a local researcher to be able to use it in their native tongue.

A ParlaMint <teiHeader> contains three obligatory elements: the file description, <fileDesc>, the encoding description, <encodingDesc>, and the profile description, <profileDesc>, and an optional revision description, <revisionDesc>:

Below we explain each of these element in turn.

4.1. File description

The file description, <fileDesc> is composed of five obligatory elements, namely the title statement, <titleStmt>, the edition statement, <editionStmt>, the extent, <extent>, the publication statement, <publicationStmt>, and the source description, <sourceDesc>:

4.1.1. Title statement

The title statement, <titleStmt> gives the title of the corpus root or component, along with the specification of the particular session(s) of the parliament contained, the persons responsible for compiling the corpus, and the funder(s) of the project.

This structure is exemplified by the following corpus root title statement:

<titleStmt> <title type="main">Slovenski parlamentarni korpus ParlaMint-SI [ParlaMint]</title> <title type="main" xml:lang="en">Slovenian parliamentary corpus ParlaMint-SI [ParlaMint]</title> <title type="sub">Zapisi sej Državnega zbora Republike Slovenije, 7. in 8. mandat (2014 - 2020)</title> <title type="sub" xml:lang="en">Minutes of the National Assembly of the Republic of Slovenia, Term 7 and 8 (2014 - 2020)</title> <meeting n="7" corresp="#DZ" ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting> <meeting n="8" corresp="#DZ" ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting> <respStmt> <persName ref="https://orcid.org/0000-0001-6143-6877">Andrej Pančur</persName> <persName ref="https://orcid.org/0000-0002-1560-4099">Tomaž Erjavec</persName> <resp>Kodiranje ParlaMint TEI XML</resp> <resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp> </respStmt> <funder> <orgName>Raziskovalna infrastruktura CLARIN</orgName> <orgName xml:lang="en">The CLARIN research infrastructure</orgName> </funder> <funder> <orgName>Slovenska raziskovalna infrastruktura CLARIN.SI</orgName> <orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName> </funder> </titleStmt>

The title statement starts with two titles (one main, the other subordinate), both in English and the local language, with the appropriate language code possibly inherited from a superordinate element. They are distinguished by the value main or sub of their type attribute and the value of their xml:lang attribute.

The main title has a formulaic structure ‘<Country name> parliamentary corpus ParlaMint-<Country code> [ParlaMint]’, with an equivalent structure for the local language. Note that the corpus ‘stamp’ in square brackets can also be ‘[ParlaMint.ana]’ for the linguistically annotated version of the corpus (as explained in the Chapter on Linguistic annotation) or ‘[ParlaMint SAMPLE]’ for corpus data samples, as available on the ParlaMint GitHub repository.

The subordinate title, in contrast to the main one, is free text, and usually formed on the basis of the source of the corpus. As with the main one, it should be given in both languages.

After the titles come the specification of the particular sessions that the corpus contains, encoded as <meeting> elements: the two meeting elements in the above example state that the ParlaMint-SI corpus contains the meetings of the 7th and 8th terms of the lower house of the National Assembly of the Republic of Slovenia. The <meeting> elements can give, as the value of their n attribute, the numbers of the meetings that the corpus covers, and their text content can give a free-text description of the meetings in the local language.

The formal information on the meetings is given in the values of the corresp and ana attributes, which are pointing attributes, as already explained in the Section on Attributes of top-level elements. Here they refer to the definition of organisations further explained in the Section on Organisations and the categories of taxonomy elements, further explained in the Section on the Class declaration. The value of the corresp attribute points to the governmental body of which a particular meeting element is a meeting of (in this case the National Assembly of the Republic of Slovenia), while the ana attribute contains a space-delimited sequence of pointers: #parla.lower points to the definition of the lower house, #parla.term to the definition of a parliamentary term, and #DZ.7 to the definition of the seventh mandate.

Next come one or more responsibility statements, <respStmt>, each one containing one or more person names, <persName>, with an optional ref attribute, giving the URL, where more information about the person can be found, and the responsibility element <resp>, which specifies what responsibility the statement is about.

In a similar manner, the <funder> elements give information on the organisations which have financially contributed to the compilation of the corpus, with the names of the organisations given in the <orgName> elements.

A corpus component has a very similar title statement to the corpus root, except that certain elements specify the metadata of the component, rather than the complete corpus. The also contain some redundant metadata, in particular, the responsibility statement and the funder, as illustrated in the example below:

<titleStmt> <title type="main">Slovenski parlamentarni korpus ParlaMint-SI, izredna seja 59 [ParlaMint]</title> <title type="main" xml:lang="en">Slovenian parliamentary corpus ParlaMint-SI, Extraordinary Session 59 [ParlaMint]</title> <title type="sub">Zapisi sej Državnega zbora Republike Slovenije, 7. mandat, 59. izredna seja, 13.4.2018</title> <title type="sub" xml:lang="en">Minutes of the National Assembly of the Republic of Slovenia, Term 7, Extraordinary Session 59, 13.4.2018</title> <meeting n="59" corresp="#DZ" ana="#parla.lower #parla.meeting.extraordinary">Izredna</meeting> <meeting n="7" corresp="#DZ" ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting> <respStmt> <persName>Andrej Pančur</persName> <resp>Kodiranje TEI</resp> <resp xml:lang="en">TEI corpus encoding</resp> </respStmt> <funder> <orgName>Raziskovalna infrastruktura CLARIN</orgName> <orgName xml:lang="en">The CLARIN research infrastructure</orgName> </funder> <funder> <orgName>Slovenska raziskovalna infrastruktura CLARIN.SI</orgName> <orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName> </funder> </titleStmt>

In the example it can be seen that the main title of a corpus component is simply an extension of the corpus root title, as it also gives the name of the particular meeting that the component contains, while the subordinate title is, again, free text. Both titles must be unique in the complete corpus.

The other difference is in the <meeting> elements, which here specify a particular meeting of the corpus component transcription. In the exmple above, this is an extraordinary meeting of the lower house in the seventh term of the National Assembly of the Republic of Slovenia.

4.1.2. Edition statement

ParlaMint corpora have their edition statement, <editionStmt> both in the corpus root and components. As illustrated below, the only element it contains is <edition>:

We use semantic versioning to specify the version of the corpus, i.e. giving the version number, where a new major version means substantial changes to the corpus, while the minor version is reserved for e.g. correcting errata or other minor changes. We do not use the patch number. It should be noted that - at least so far - all the ParlaMint corpora were released together, so that they are all of the same edition, i.e. have the same version number. At the time of writing, the latest version is 2.1, with the next one planned to be 3.0.

4.1.3. Extents

The <extent> element gives information on selected sizes of the complete corpus (in the corpus root) or of one corpus component, as illustrated below in the case of a corpus root extent:

<extent> <measure unit="speeches" quantity="75122" xml:lang="sl">75.122 govorov</measure> <measure unit="speeches" quantity="75122" xml:lang="en">75,122 speeches</measure> <measure unit="words" quantity="20190034" xml:lang="sl">20.190.034 besed</measure> <measure unit="words" quantity="20190034" xml:lang="en">20,190,034 words</measure> </extent>

ParlaMint requires two sizes to be given, and in both languages, which are distinguished by their unit attribute, namely the number of speeches and the number of words. The exact quantity is given in the quantity attribute, while the text content of <measure> gives the quantity together with the unit - if possible, the number here should contain the thousands separator appropriate for the language.

It should be noted that both sizes are somewhat complex to compute and are inserted into the TEI headers in the finalisation of a corpus (cf. the Section on Finalisation of corpora) by a common script, so it is not necessary to insert the extent in the process of developing a ParlaMint corpus.

4.1.4. Publication statement

The publication statement <publicationStmt> must appear in the corpus root as well as, in identical form, in the corpus components. As illustrated below, it contains information about the publisher of the corpus, the persistent identifier where the complete corpus can be found, under which licence it is distributed, and when it was released:

<publicationStmt> <publisher> <orgName xml:lang="sl">Raziskovalna infrastrukutra CLARIN</orgName> <orgName xml:lang="en">CLARIN research infrastructure</orgName> <ref target="https://www.clarin.eu/">www.clarin.eu</ref> </publisher> <idno type="URI" subtype="handle">http://hdl.handle.net/11356/1432</idno> <availability status="free"> <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="sl">To delo je ponujeno pod <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Priznanje avtorstva 4.0 mednarodna licenca</ref>.</p> <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p> </availability> <date when="2021-06-11">11. 6. 2023</date> </publicationStmt>

The <publisher> is, at least for the corpora produced in the scope of the CLARIN ParlaMint project, the CLARIN research infrastructure, and the element also gives the home page of the infrastructure. The ‘identifier number’ element, <idno>, specifies via its type and subtype attributes with fixed values URI and handle that the identifier is a handle, and contains the handle where the complete corpus corresponding to the specified version can be found. The <availability> specifiers, via its <licence> element the fixed-value CC BY 4.0 URL, and in the following paragraph gives a prose description of the licence, including its URL via the target attribute of <ref>. As usual, the textual information is given in both languages. Finally, the <date> gives the date of the release, where the when gives the date in the ISO 8601 format, while the textual content can give it according to the conventions used in the local language.

4.1.5. Source description

The source description <sourceDesc> of the corpus root encodes the original digital source of the ParlaMint corpus in the <bibl> element, as shown in the following example:

<sourceDesc> <bibl> <title type="main" xml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title> <title type="main" xml:lang="en">Minutes of the National Assembly of the Republic of Slovenia</title> <idno type="URI">https://www.dz-rs.si</idno> <date from="2014-08-01" to="2020-07-16">1.8.2014 - 16.7.2020</date> </bibl> </sourceDesc>

Apart from the bi-lingual <title>s, it should also give in <idno> with the fixed type as URI the government URL where the transcripts were first harvested from, while the dates of the earliest and latest transcript in the corpus are indicated by the from and to attributes of the <date> element. As usual, the values of these attributes should be according to ISO 8601, while the textual content can be formatted according to the local rules for writing dates.

For corpus components the source description is very similar to the one for the corpus root, except that the <title> can be modified to constrain the description to the exact meeting the component contains. The <date> element must, of course, specify the exact date when the meeting took place. If the transcription of the meeting is avilable on the Web, the <idno> should give this URL. Furthermore, if the audio or video of the meeting is available, this information can be given in the <recodingStmt>, as illustrated in the example below:

<sourceDesc> <bibl> <title type="main" xml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title> <title type="main" xml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title> <idno type="URI">https://www.psp.cz/eknih/2013ps/stenprot/044schuz/s044033.htm</idno> <date when="2016-04-13">13.04.2016</date> </bibl> <recordingStmt> <recording type="audio"> <media xml:id="ps2013-044-02-000-000.audio1" mimeType="audio/mp3" source="https://www.psp.cz/eknih/2013ps/audio/2016/04/13/2016041308580912.mp3" url="2013ps/audio/2016/04/13/2016041308580912.mp3"/> </recording> </recordingStmt> </sourceDesc>

As the example shows, the recording statement contains a <recording> element, which specifies the type of the recording (audio or video), and then contains a <media> element giving the ID of the file, its mimeType, the URL of the source of the recording (typically the official governmental site for parliamentary proceedings) and the local (possibly processed) copy of the file; this can be a local file, even though it won't be distributed together with the ParlaMint corpus or, better, a Web-based file on a stable location.

4.2. Encoding description

The encoding description <encodingDesc> of the corpus root contains the following elements:

In contrast, the encoding description of a corpus component contains only two elements, namely (and redundantly) the <projectDesc> and the <tagsDecl>.

4.2.1. Project description

The project description <projectDesc> of the corpus root contains a short description of the project in the scope of which the corpus was compiled:

<projectDesc> <p xml:lang="sl">Glavni cilji projekta <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> so (1) izdelati večjezično množico na enak način kodiranih korpusov zapiskov parlamentarnih sej, (2) jezikoslovno označiti te korpuse; (3) narediti korpuse dostopne za prevzem in prek konkordančnikov; in (4) pripraviti primere uporabe korpusov v politologiji in digitalni humanistiki.</p> <p xml:lang="en">The <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> project aims to (1) create a multilingual set of uniformly encoded comparable corpora of parliamentary proceedings (2) process the corpora linguistically; (3) make the corpora available for download and through concordancers; and (4) build use cases in Political Sciences and Digital Humanities based on the corpus data.</p> </projectDesc>

The description above is written for the CLARIN ParlaMint project and the English language part can be used as is in the produced corpora for version 3.

4.2.2. Editorial declaration

The editorial declaration, <editorialDecl> is used only in the corpus root and contains prose descriptions of the editorial decision made in the process of compiling the corpus, along several dimensions, in particular what, if any types of <correction>, <normalization>, <quotation>, <hyphenation>, and <segmentation> was performed on the source texts of the corpus. The example below illustrates the use of these elements:

<editorialDecl> <correction> <p xml:lang="en">No correction of source texts was performed.</p> </correction> <normalization> <p xml:lang="en">Text has not been normalised, except for spacing.</p> </normalization> <hyphenation> <p xml:lang="en">No end-of-line hyphens were present in the source.</p> </hyphenation> <quotation> <p xml:lang="en">Quotation marks have been left in the text and are not explicitly marked up.</p> </quotation> <segmentation> <p xml:lang="en">The texts are segmented into utterances (speeches) and segments (corresponding to paragraphs in the source transcription).</p> </segmentation> </editorialDecl>

4.2.3. Tags declaration

The tags declaration, <tagsDecl> of the corpus root gives the count of all the XML tags used in the data part (so, not in the TEI header) of the corpus (for the corpus root) or in an individual component of the corpus. To distinguish the TEI elements from the possible use of elements from other namespaces, a <namespace> element giving the TEI namespace in its name attribute is introduced first. Inside it, each TEI tag is listed in its own <tagUsage> element, with the attribute gi giving the name of the tag and occurs the number of occurrences, as shown in the following example:

It should be noted that similar to the extents (as explained in the Section on Extents) the tag usage is inserted into the TEI headers in the finalisation of a corpus (cf. the Section on Validation and conversion) by a common script, so it is not necessary to compute it the process of developing a ParlaMint corpus.

4.2.4. Class declaration and taxonomies

The class declaration, <classDecl> is used only in the corpus root and contains only definitions of (most) controlled vocabularies used in ParlaMint corpora. These vocabularies, possibly hierarchically organised, are encoded using the <taxonomy> element.

The taxonomies themselves are stored in separate files, and are typically ParlaMint-wide, i.e. all corpora use the same taxonomies. The taxonomies are included in the document root with the XInclude directive, as illustrated below, for the case of the Czech corpus:

As can be seen, three of the taxonomies are general ParlaMint taxonomies, while two are corpus specific, and are distinguished by including the country code CZ (followed by hyphen) into the filename.

To illustrate the structure of a taxonomy element, we give below the simplest common taxonomy (and include only the descriptions in English), which contains the categories that define the three subcorpora of a ParlaMint corpus:

<classDecl> ... <taxonomy xml:id="ParlaMint-taxonomy-subcorpus" xml:lang="mul"> <desc xml:lang="en"> <term>Subcorpora</term> </desc> <category xml:id="reference"> <catDesc xml:lang="en"> <term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc> </category> <category xml:id="covid"> <catDesc xml:lang="en"> <term>COVID</term>: COVID subcorpus, from 2020-01-31 onwards, when WHO made the formal declaration of PHEIC, i.e. the Public Health Emergency of International Concern for COVID-19</catDesc> </category> <category xml:id="war"> <catDesc xml:lang="en"> <term>War</term>: War in Ukraine subcorpus, from 2022-02-24 onwards, i.e. from Russia's full-scale invasion of Ukraine</catDesc> </category> </taxonomy> </classDecl>

A <taxonomy> thus first describes, via <desc>, what it is a taxonomy of, and then lists (the possibly nested) categories in <category> elements. Crucial here are the values of their xml:id attributes, by which a category is referred to, e.g. via the ana attribute of some other element, as was already explained in the Section on Attributes of top-level elements, in connection with classifying a corpus component via the ana attribute of its <TEI> element. The taxonomy category then bilingually glosses its meaning in its <catDesc> elements, which should always first contain the short name of the category, encoded in the <term> element.

Please note that the identifiers (i.e. values of the xml:id attributes) in the corpus specific taxonomies must start with the country or region code of the corpus followed by a hyphen, in order not to conflict with present of future ParlaMint corpus-wide identifiers. For example, the category identifiers in the Czech specific ParlaMint-CZ-taxonomy-meeting.parts.xml mentioned above, have to be prefixed by CZ-, as shown in the following example:

<taxonomy xml:id="ParlaMint-CZ-taxonomy-meeting.parts" xml:lang="mul"> ... <category xml:id="CZ-parla.agenda"> <catDesc xml:lang="cs"> <term>Bod jednání</term> </catDesc> <catDesc xml:lang="en"> <term>Agenda</term>: topic discussed during sitting</catDesc> </category> </taxonomy>

ParlaMint requires several taxonomies to be defined in the class declaration of the corpus root (as well as a additionaly ones for the linguistically annotated corpus, as further described in the Section on Linguistic metadata). As mentioned, these taxonomies are defined globally and available as part of the data on the ParlaMint GitHub repository, and there is a special procedure modifying them, in particular on how to insert translations of a new language.

The five obligatory taxonomies are:

The subcorpus taxonomy, already given in the example
The taxonomy of speaker types, which distinguishes e.g. the chair of a meeting, ordinary speakers, and guest speakers.
The legislature taxonomy, which gives the possible organisations of a parliament, and is by far the most complex one.
The political orientation taxonomy, which gives the values of the (mostly) left-right political orientation of political parties and parliamentary groups.
The CHES variables taxonomy, which gives the variables of the Chapel Hill Survey on political parties (cf. the Section on Political parties and parliamentary groups).
The CAP topics taxonomy, which gives major topic labels of the Comparative Agendas Project.

Furhtermore, there are several obligatory taxonomies which pertain to the linguistically analysed version of the corpus only, cf. the Section on Linguistic taxonomies.

4.3. Profile description

The profile description, <profileDesc> is the third main division of the metadata provided by the TEI header. It contains a description of non-bibliographic aspects of the corpus, for example the list of speakers with their metadata. For the corpus root, it contains four elements, of which only the first, the <settingDesc> is used in corpus components. The elements are listed below:

We explain the contents of each element in the following sections.

4.3.1. Setting description

The setting description, <settingDesc>, is used by both the corpus root and corpus components, and contains only one element, <setting>, which then gives information on where and when the meetings included took place. The example below gives a typicaly corpus root setting description:

<settingDesc> <setting> <name type="place">Westminster</name> <name type="city">London</name> <name type="country" key="GB">U.K.</name> <date from="2015-01-01" to="2021-03-31"/> </setting> </settingDesc>

As can be seen, the location of the meeting is given in <name> elements with the type attribute further specifying what kind of location this is, with country (or region for regional parliaments) additionally having the key, which gives is ISO 3166 code, as explained in the Section on Standard values. For the corpus root it also contains the interval of the dates of the transcripts included in the corpus. Note that this date range is also given in the <sourceDesc>, as explained in the Section on the Source description.

The setting description is also present in corpus components, and is very similar to the one in the corpus root, except that it can additionally specify the location of the meeting and must give its exact date, as illustrated below:

<settingDesc> <setting> <name type="place">Commons Chamber</name> <name type="place">Westminster</name> <name type="city">London</name> <name type="country" key="GB">U.K.</name> <date when="2019-02-18">February 18th, 2019</date> </setting> </settingDesc>

4.3.2. Text class

The text class, <textClass> groups information which describes the nature or topic of the corpus in terms of a standard classification scheme. It is used only in the ParlaMint corpus root, where it contains the category reference, <catRef> element:

The category reference specifies in the value of the scheme attribute which scheme it uses, and this will always be a pointer to the ParlaMint-wide taxonomy on legislature, as further explained in the Section on the Class declaration. The target attribute then gives pointers to the kind of legislature types the country or region has and what the corpus contains; in the case above, the country has a bicameral parliament, and the corpus contains the transcriptions of both the upper and lower house sittings. In general, the options are:

#parla.uni: Unicameral parliament
#parla.bi #parla.lower: Bicameral parliament, lower house only
#parla.bi #parla.upper: Bicameral parliament, upper house only
#parla.bi #parla.lower #parla.upper: Bicameral parliament, both houses

4.3.3. Participant description

The participant description, <particDesc> gives the information about about the speakers whose speeches constitute the corpus transcripts, as well as information about the government, parliament, parliamentary groups of political parties and other ‘organisations’ relevant to the affiliations of the speaker or the corpus in general. The <particDesc> is a part of the TEI header of the corpus root and contains two dedicated types of lists, <listOrg> for organisations, and <listPerson> for the speakers, as shown below:

While the above gives the XML structure of the participant description, ParlaMint separates the organisation and person list into separate files (cf. the Section on File names and directory structure), so the actual encoding would be (for the CZ corpus) as follows:

Given the importance that ParlaMint gives to the information on speakers and their affiliations to (political) organisations, as well as the richness of this information, the content of <listOrg> and <listPerson> is further explained in a separate Chapter on Speakers and their organisations.

4.3.4. Language usage

The language usage, <langUsage> is the fourth and last element of the profile description of a corpus root and defines the languages that are used in the corpus. Typically the language use will define (bilingually) only two languages, the local language and English, as the language used in the metadata, for example:

<langUsage> <language ident="sl" xml:lang="sl">slovenski</language> <language ident="en" xml:lang="sl">angleški</language> <language ident="sl" xml:lang="en">Slovenian</language> <language ident="en" xml:lang="en">English</language> </langUsage>

In cases where the transcription contains more than one language, the percentage of their use can also be indicated in the usage element of the <language> elements, as illustrated in the example below:

<langUsage> <language ident="en" xml:lang="en">English</language> <language ident="en" xml:lang="nl">Engels</language> <language usage="45" ident="nl" xml:lang="en">Dutch</language> <language usage="45" ident="nl" xml:lang="nl">Nederlands</language> <language usage="55" ident="fr" xml:lang="en">French</language> <language usage="55" ident="fr" xml:lang="nl">Frans</language> </langUsage>

4.4. Revision description

The revision description, <revisionDesc> is the fourth, and last element of the TEI header. It is an optional element that can appear in the corpus root or component, and documents the revisions made in the corpus or component. Its structure is illustrated below:

<revisionDesc> <change when="2021-06-11"> <name>Tomaž Erjavec</name>: Finalized encoding.</change> <change when="2021-05-28"> <name>Tomaž Erjavec</name>: Built corpus.</change> </revisionDesc>

The revision description consists of a series of <change> elements, with the attribute when giving the date of the change, and the content containing the <name> of the person responsible for the change, and a free-text description of the change.

5. Speakers and their organisations

ParlaMint places considerable emphasis of including in the corpora significant information about the persons giving the speeches contained in the transcriptions. This is why, even though this information is encoded in the <particDesc> element of the <teiHeader> of the corpus root (cf. the Chapter on Participant description) we treat it here in a separate Chapter. Below we first discuss the information on persons, including how they are affiliated with (political) organisation, and then explain the encoding of these organisations.

5.1. Speakers

The information on speakers is given in the <listPerson> element of the <particDesc> element (cf. the Section on Participant description). This element contains the series of <person> elements, each of which gives information on an individual speaker, as the example below illustrates:

<listPerson> <person xml:id="AccettoMatej"> <persName> <surname>Accetto</surname> <forename>Matej</forename> </persName> <sex value="M"/> </person> ... </listPerson>

Each <person> must have an xml:id attribute, so that it can be referred to from the transcription. The person's name, <persName> gives the name of the person which is further decomposed into the person <surname>(s), <forename>(s) and possibly <addName>. A person's name can also change, typically because of marriage. In this case, the <person> should contain another (or, possibly, several) <persName> elements, each marked by the from and/or to temporal attributes, as shown in the following example:

<person xml:id="GlawischnigAnna"> <persName to="2016-06-01"> <surname>Glawischnig</surname> <forename>Anna</forename> </persName> <persName from="2016-06-02"> <surname>Glawischnig-Piesczek</surname> <forename>Anna</forename> </persName> ... </person>

Note that the to/from dates should not overlap, neither should they have gaps in them, as this would mean that the person could have either two names at once, or none.

The person must also have the <sex> element, with the value attribute being one of the controlled values: M for male, F for female, O for other, N for none or U for unknown.

The person element can also contain other optional information, i.e. the date and place of <birth> (and <death>), their official Web page, link(s) to Wikipedia, their VIAF, or photo, as illustrated below:

<person xml:id="SayeedaWarsi"> <persName> <forename>Sayeeda</forename> <surname>Warsi</surname> </persName> <sex value="F"/> <birth when="1971-03-28"> <placeName ref="https://www.geonames.org/2651286/">Dewsbury</placeName> </birth> <idno type="URI" subtype="contact">https://members.parliament.uk/member/3839/contact</idno> <idno type="URI" subtype="wikimedia">https://en.wikipedia.org/wiki/Sayeeda_Warsi,_Baroness_Warsi</idno> <idno type="URI" subtype="wikimedia" xml:lang="es">https://es.wikipedia.org/wiki/Sayeeda_Warsi</idno> <idno type="URI" subtype="viaf">http://viaf.org/viaf/33149912470406211798</idno> <figure> <graphic url="https://api.parliament.uk/photo/Paa3j0vS.jpg?crop=CU_1:1"/> </figure> </person>

Finally, the name of the person, their place of birth etc. can als be written in several languages or scripts, as illustrated below:

<person xml:id="PlevnelievRosen"> <persName xml:lang="bg"> <forename>Росен</forename> <surname>Асенов</surname> <surname>Плевнелиев</surname> </persName> <persName xml:lang="en"> <forename>Rosen</forename> <surname>Asenov</surname> <surname>Plevneliev</surname> </persName> <sex value="M"/> <birth when="1964-05-14"> <placeName>Гоце Делчев</placeName> <placeName xml:lang="bg-Latn">Gotse Delchev</placeName> </birth> <education>Висшия машинно-електротехнически институт в София, със специалност „изчислителна техника“</education> <education xml:lang="bg-Latn">Visshiya mashinno-elektrotehnicheski institut v Sofiya, sas spetsialnost „izchislitelna tehnika“</education> <occupation>политик</occupation> <occupation xml:lang="bg-Latn">politik</occupation> <affiliation from="2012-01-22" ref="#republic.Bulgaria" role="member" to="2017-01-21"/> <affiliation from="2012-01-22" ref="#republic.Bulgaria" role="head" to="2017-01-21"> <roleName xml:lang="bg">4-и президент на Република България</roleName> <roleName xml:lang="bg-Latn">4-i prezident na Republika Balgariya</roleName> </affiliation> <idno type="URI" subtype="wikimedia">https://bg.wikipedia.org/wiki/Росен_Плевнелиев</idno> <idno type="URI" subtype="wikimedia">https://en.wikipedia.org/wiki/Rosen_Plevneliev</idno> </person>

5.1.1. Speaker affiliations

And important element of a person is <affiliation>, which associates the speakers with organisations, i.e. it specifies who is a member of the government, parliament, a parliamentary group of political parties² or political parties themselves, as well as who holds a relevant office, e.g. that they are the president, chairman, minister etc. in or of a given organisation. The following example shows the use of the <affiliation> element for specifying membership in organisations:

<person xml:id="BahŽibertAnja"> <persName> <forename>Anja</forename> <surname>Žibert</surname> </persName> <sex value="F"/> <affiliation role="member" ref="#parliamentSI" from="2014-08-01" to="2018-06-21" ana="#DZ.7"/> <affiliation role="member" ref="#parliamentSI" from="2018-06-22" ana="#DZ.8"/> </person>

The formal type of affiliation is given in the role attribute, in this case member. The ref attribute points to the ID of the organisation (cf. the Section on Organisations) in which the person has the specified role. For MPs, as is the case above, this will be the ID of the parliament (cf. the Section on The parliament organisations).

The example above also shows the use of the (optional) classification attribute ana, which points to the specification of the legislative period in which the person was affiliated with the specified organisation. Such legislative periods are typically given as <event> elements inside the government or parliament organisations, as futher explained in the Sections on the government and parliament organisations.

The affiliation element can also have the usual from and to attributes, i.e. from and to when the person was affiliated with the organisation.

The role attribute can have as its value one of the values given by the ParlaMint schema. For backward compatibility with ParlaMint I corpora, there are some roles that are only used by one corpus (cf. the definition of role for <affiliation>), but the main ones that should be used are:

member specifies that a person is a member of the organisation
head is used for the lead person in an organisation, regardless of how this role is named in the specific country and for the specific organisation, i.e. it can be used for the queen (head of the country organisation), president (head of the republic organisation), prime minister (head of goverment organisation), minister (head of ministry organisation), chairperson (head of committee organisation), etc.
deputyHead is the deputy head of the organisation, again, regardless of how this role is named in the specific country and for the specific organisation, i.e. it is used for the vice president, deputy chairperson, etc.
minister is used to indicate that the person is the minister in the government. Note that this only says that they are a minister, and when. To encode what they are a minister of, the ministry organisation needs to be defined, and the minister associated with it with the head role.

Because the values of the role attribute are quite generic, ParlaMint also allows to specify the exact name of the affiliation role for a particular country or region using the <roleName> element in the content of <affiliation>, preferably both in the local language, as well as in English, as illustrated in the following example, which specifies that somebody is a minister, and also what they are a minister of:

<affiliation role="minister" ref="#GOV" from="2020-08-01"> <roleName xml:lang="sl">Minister za obrambo</roleName> <roleName xml:lang="en">Minister of Defence</roleName> </affiliation> <affiliation role="head" ref="#MinistryOfDefence" from="2020-08-01"> <roleName xml:lang="sl">Minister za obrambo</roleName> <roleName xml:lang="en">Minister of Defence</roleName> </affiliation>

Finally, it is also possible to specify the name of the organisation that the person is affilated with inside the <affiliation> element, using the <orgName> element³ again, preferably both in the local language, as well as in English, as illustrated in the following example, which specifies that somebody is a minister of a certain ministry:

<affiliation role="minister" ref="#GOV" from="2020-08-01"> <orgName xml:lang="sl">Ministrstvo za obrambo</orgName> <orgName xml:lang="en">Ministry of Defence</orgName> </affiliation>

It should be noted that ParlaMint makes no assumptions on the connection between various roles, e.g. we do not assume that if somebody has a minister role in the government that they are also a member of the government. Therefore it is necessary to specify all the desired affiliations with their particular roles, e.g. both as minister and as member.

It is important to give correct roles to the affiliations that associate a person with organisations. We list the most common roles and how they should be encoded, emphasising the ones that are obligatory in ParlaMint:

King or queen: affiliation/@role="head" → org/@role="country"
President (as opposed to Prime minister): affiliation/@role="head" → org/@role="republic"
Prime minister (or other head of government): affiliation/@role="head" & affiliation/@role="member" → org/@role="government" (cf. Section on The government organisation)
Deputy primer minister: affiliation/@role="deputyHead" & affiliation/@role="member" → org/@role="government"
Minister: affiliation/@role="minister" & affiliation/@role="member" → org/@role="government"
If ministries are defined, then also: affiliation/@role="head & affiliation/@role="member" → org/@role="ministry"
Deputy minister: affiliation/@role="deputyMinister" & affiliation/@role="member" → org/@role="government"
If ministries are defined, then also: affiliation/@role="deputyHead & affiliation/@role="member" → org/@role="ministry"
Leader of parliamentary group: affiliation/@role="head" & affiliation/@role="member" → org/@role="parliamentaryGroup" (cf. Section on Political parties and parliamentary groups)
Member of parliamentary group: affiliation/@role="member" → org/@role="parliamentaryGroup"
Leader of political party: affiliation/@role="head" & affiliation/@role="member" → org/@role="politicalParty"
Member of political party: affiliation/@role="member" → org/@role="politicalParty"
MP: affiliation/@role="member" → org/@role="parliament" (cf. Section on The parliament organisations)

5.2. Organisations

Information on the the government, parliament, political parties or parliamentary groups of political parties, as well as other optional parliamentary structures (e.g. ministries, committees) is given in the corpus root <listOrg> element of the <particDesc> (cf. the Section on Participant description). The <listOrg> element then contains a series of <org> elements, each giving information about one organisation. The organisations are followed by the <listRelations> element giving a list of relations between organisations:

We exemplify the structure of one organisation element by giving the basic information about a parliamentary group of political parties:

<org role="parliamentaryGroup" xml:id="group.LN-Aut"> <orgName full="yes" xml:lang="it">Lega Nord e Autonomie</orgName> <orgName full="abb">LN-Aut</orgName> <event from="2013-03-15"> <label xml:lang="en">existence</label> </event> </org>

First, each organisation must have an xml:id attribute, so that other elements (in particular, <person>s) can refer to it. The fact that the organisation is a parliamentary group of political parties is encoded in the role attribute, and we elaborate on this below. The name of the organisation group is given in the <orgName> element, which also uses the full attribute to distinguish between the full name of the party and the abbreviated name of the party group.

Organisations are also created and dissolved, and this information is encoded in the <event> element, which has as its <label> existence, and where the start of its existence is given in the from attribute; as there is no to attribute, this also means that the party still exists.

The example above gives the minimal required information about an organisation but ParlaMint also allows further data to be added, as exemplified by the following example:

<org xml:id="party.PS" role="parliamentaryGroup"> <orgName full="yes" xml:lang="sl">Pozitivna Slovenija</orgName> <orgName full="yes" xml:lang="en">Positive Slovenia</orgName> <orgName full="abb">PS</orgName> <event from="2011-10-22"> <label xml:lang="en">existence</label> </event> <idno type="URI" subtype="wikimedia" xml:lang="sl">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno> <idno type="URI" subtype="wikimedia" xml:lang="en">https://en.wikipedia.org/wiki/Positive_Slovenia</idno> </org>

Here the full name of the organisation is also given in English, and there two external links giving URLs to further information about the organisation. In the example above, these are the Slovene and English Wikipedia pages, as indicated by the appropriate values of the xml:lang attribtes on the <idno> element.

Returning to the role attribute, it is the ParlaMint schema (cf. the Section on Validating ParlaMint corpora) that gives its set of allowed values. Currently, the list is quite long, as we left it up to the partners of ParlaMint I to determine the values, however, there are some that are common to all corpora, and, for some, it is obligatory to have organisations with these roles in the organisation list of a corpus. Furthermore, it is recommended that the organisations are listed in the order as given below, with the obligatory roles emphasised:

country: the country taken as an organisation, which can be used to specify the king or queen of the country as a <person> affiliated with the country organisation with the role head;
republic: the republic, which can be used to specify the president of the country as a <person> affiliated with the republic organisation with the role head;
government: the government of the country or region;
ministry: a particular ministry, which can be used to specify the minister as a <person> affiliated with the ministry organisation with the role head;
parliament: the parliament, upper or lower house of the country or region;
parliamentaryGroup: a grouping of political parties for the purpose of acting as one in the government;
politicalParty: a political party.

We discuss the obligatory types of organisations and political parties in the following sections.

5.2.1. The government organisation

The government organisation, distinguished by the role government, is required to be present in the <listOrg> of every corpus root and is, by convention, the first organisation in the list. It gives the ID and name of the country or region government, and also contains the list of specific governments by using the dedicated list element for events, <listEvent>, which then gives the these governments as a series of <event> elements:

<org xml:id="ParlaMint-SI-GOV" role="government"> <orgName xml:lang="sl" full="yes">Vlada Republike Slovenije</orgName> <orgName xml:lang="en" full="yes">Government of the Republic of Slovenia</orgName> <event from="1990-05-16"> <label xml:lang="en">existence</label> </event> <idno type="URI" subtype="wikimedia" xml:lang="sl">https://sl.wikipedia.org/wiki/Vlada_Republike_Slovenije</idno> <idno type="URI" subtype="wikimedia" xml:lang="en">https://en.wikipedia.org/wiki/Government_of_Slovenia</idno> <listEvent> <event xml:id="GOV.11" from="2013-03-20" to="2014-09-18"> <label xml:lang="sl">11. vlada Republike Slovenije (20. marec 2013 - 18. september 2014)</label> <label xml:lang="en">11th Government of the Republic of Slovenia (20 March 2013 - 18 September 2014)</label> </event> ... <event xml:id="GOV.14" from="2018-03-13"> <label xml:lang="sl">14. vlada Republike Slovenije (13. marec 2020 - danes)</label> <label xml:lang="en">14th Government of the Republic of Slovenia (March 13, 2020 - today)</label> </event> </listEvent> </org>

5.2.2. The parliament organisations

The parliament organisations, distinguished by the role parliament, are also required to be present in the <listOrg> of every corpus root and are, by convention, the second and third organisation in the list. There will be two parliament organisations for bicameral chambers, at least in cases where the corpus contains the transcripts of both the upper and the lower house. For unicameral ones, or if the transcripts contains only the lower house, there will be only one parliament organisation. Which of the three options the organisation encoded is determined by the value of the ana attribute, which refers to the appropriate ID of the category in the legislature taxonomy (cf. the Section on Class declaration). The ana attribute should also specify whether the parliament is national or regional, with the categories also specified in the legislature taxonomy. In short, the theoretically possible values of ana for the parliament organisations are:

#parla.national #parla.uni: national parliament, unicameral system
#parla.national #parla.lower: national parliament, lower house
#parla.national #parla.upper: national parliament, upper house
#parla.regional #parla.uni: regional parliament, unicameral system
#parla.regional #parla.lower: regional parliament, lower house
#parla.regional #parla.upper: regional parliament, upper house

Otherwise, the structure of the parliament organisations is identical to the one for the government, i.e. it gives the ID and name of the parliament and encodes the successive parliaments, using the <event> element:

<org ana="#parla.national #parla.lower" role="parliament" xml:id="be_federal_parliament"> <orgName full="yes" xml:lang="nl">Federaal Parlement van België</orgName> <orgName full="yes" xml:lang="en">Belgian Federal Parliament</orgName> <event from="1831-02-07"> <label xml:lang="en">existence</label> </event> <listEvent> <head xml:lang="nl">Zittingsperiode</head> <head xml:lang="en">Legislative period</head> <event to="2007-05-02" from="2003-06-05" xml:id="period_51"> <label xml:lang="nl">Zittingsperiode 51</label> <label xml:lang="en">Legislative period 51</label> </event> ... <event from="2019-06-20" xml:id="period_55"> <label xml:lang="nl">Zittingsperiode 55</label> <label xml:lang="en">Legislative period 55</label> </event> </listEvent> </org>

5.2.3. Political parties and parliamentary groups

In the scope of ParlaMint, very important organisations are political parties (distinguished by the role politicalParty) and, even more so, parliamentary groups that represent political parties in the parliament (distinguished by the role parliamentryGroup). These organisations are linked to <person> elements (i.e. speakers) so that is known to which political party or parliamentary group the speaker belongs to or represents in a certain moment of time, as further explained in the Section on Speaker affiliations.

ParlaMint requires that a corpus must use parliamentary groups, while the use of political parties is optional. Note that if political parties are used, it is also expected to encode which political parties constitute a parliamentary group; this is encoded via the <relation> element, as further explained in the Section on Relations between organisations.

The introduction to this chapter already gave examples of how organisations are encoded in general, so we here only give examples of the encoding of the additional metadata that can also be associated with political parties or parliamentary groups, i.e. their political orientation on the left-to-right scale and the variables of the Chapel Hill Expert Surveys for Europe, CHES for short. This additional metadata is encoded in the <state> element(s), which should be the last element(s) in the <org>.

5.2.3.1. Encoding political orientation

Political orientation is encoded with the <state> element with the value of its type attribute equal to politicalOrientation. The nested <state> elements then give the type of the information, which can be either the (corpus) encoder or Wikipedia, its source (either a pointer to the ID of the person for encoder, or to the Wikipedia URL), and the reference to the category definition (defined in the politicalOrientation taxonomy) via the ana attribute, as is illustrated in the example below:

<org role="parliamentaryGroup" xml:id="MR"> <orgName full="abb">MR</orgName> <orgName full="yes">Mouvement Réformateur</orgName> <idno type="URI" subtype="wikimedia">https://en.wikipedia.org/wiki/Reformist_Movement</idno> <state type="politicalOrientation"> <state type="encoder" source="#GrietDepoorter" ana="#orientation.CRR"> <note xml:lang="en">Orientation determined by encoder, using own knowledge of the parliamentary group.</note> </state> <state type="Wikipedia" source="https://en.wikipedia.org/wiki/Reformist_Movement" ana="#orientation.CR"> <note xml:lang="en">From 1992 the Reformist Movement (MR) consisted of: FDF, MCC, PRL and PFF. In September 2001, FDF decides to leave the alliance and chooses a new name, becoming DeFI.</note> </state> </state> </org>

Note also that a <state> may have a note that gives furuther free-text information about the orientation.

5.2.3.2. Encoding CHES variables

The second type of metadata on organisations, in particular on political parties and parliamentary groups comes from the Chapel Hill Expert Surveys for Europe (CHES), either from the 1999-2019 edition, of from the 2019 edition. Here the top-level <state> element gives the type of the state, i.e. CHES and the URL of the CSV source for the information, as well as the name of the political party in CHES (which typically differs from its name or ID in ParlaMint) in the key attributes and the year span that the CHES information covers in from and to attributes.

Each subordinate <state> (of type variable) then encodes one CHES variable, which is given, via the ana attribute, as the reference to the appropriate category defined in the CHES taxonomy (cf. the Section on Class declaration and taxonomies). Finally, as CHES gives the values of its variables according to years, the third level of <state> (of type value) stores the periods of the variable together with its numeric value in the n attribute, as illustrated in the example below:

5.2.3.3. Relations between organisations

As mentioned, the relations between various organisations, in particular, which parliamentary groups of political parties are in the coalition or in opposition and when, are encoded in the final element of <listOrg>, namely <listRelation>, which then contains <relation> elements, as shown in the example below:

The type of relation is given in the name attribute. ParlaMint allows the following values of name:

coalition: the pointers to the organisations (i.e. parliamentary groups or political parties) are given in the mutual attribute (because a coalition is mutual relation betweent its members);
opposition: the pointers to the organisations are given in the active attribute, as the organisations are in an active relation to the government, the pointer to which is given in the passive attribute;
representing: a parliamentary group representing one or more political parties in the parliament. The parliamentary group is given as the value of the active attribute, while the political parties are given as the value of the passive attribute.
renaming: the two organisation (typically political parties) referred to are essentially the same organistion, which has been, however, renamed at some point in time; the reference to the old organisation is given in the passive attribute, while the reference to the new one is given in the active attribute;
successor: an organisation (again, typically a political party), or several of them, ceased to exist, but a sucessor was created; as with renaming, the previous organisation is given as the value of the passive attribute, while the new one uses the active attribute.

For the relations it is typically also necessary to specify from and possibly to when the relation was in force. Finally, it is possible, but not necessary, to also give the legislative period and/or the government when this particular coalition or opposition existed.

6. Transcriptions

The transcriptions are encoded in the <text> element of corpus components. This element contains only the element <body>, which should then contain at least one division, <div>, as illustrated below:

As shown, the <text> element should be (as is the top level <TEI> element, as discussed in the Section on Attributes of top-level elements) marked with the ana attribute as to which subcorpus the text belongs to, with the subcorpora themselves defined in the appropriate taxonomy (cf. the Section on Class declaration).

6.1. Divisions

A text body contains a series of divisions, <div> in cases when the source document can be reliably split into sections, which is typically done on the basis of headings identified in the source. When this is not possible, the complete body will be just one division.

In ParlaMint we have two types of divisions, which are distinguished by the value of their (required) type attribute. If its value is debateSection, then the divisions must contain at least one speech, while the value commentSection must not contain any speeches, i.e. it contains transcriber (or other) comments only, e.g. the table of contents, references to laws etc.

Inside a debateSection-type division, the main elements of interest are speeches, encoded as the utterance element, <u>. However, this type of division can (and the commentSection-type must) also contain headings and notes by the transcribers that serve to structure and comment the speeches, as well as page breaks, as illustrated below:

<text ana="#reference"> <body> <div type="debateSection"> <pb n="1"/> <head>Child Poverty Unit</head> <note>Question</note> <note>Asked by</note> <u>...</u> ... <pb n="2"/> ... </div> <div type="debateSection"> <head>Trade Union Act 2016 (Political Funds)</head> <note>Motion to Approve</note> <note>Moved by</note> <u>...</u> ... </div> ... </body> </text>

It should be noted that the <head> element can appear only at the start of a division while notes can be interspersed with speeches anywhere inside a division, and can also appear inside speeches, i.e. inside <u> elements. It is also possible to specify the type of note, and, in fact, to use more precise elements than just <note>, which is further explained in the Section on Transcriber comments.

Page breaks, <pb> are an optional element, and can also appear both inside divisions and inside speeches or segments (as well as inside the <note> element, and, for the lingustically analysed version, inside sentences). They are used to preserve the page breaks from the digital source, possibly together with the page number, as the value of the n attribute. They can also point to the source of a particular page in the corpus via the source attribute and to their media file via the corresp attribute (cf. the Section on Source description and esp. the Example on the source description of a component file), as illustrated in the example below:

<div type="debateSection"> <pb n="1" source="https://www.psp.cz/eknih/2013ps/stenprot/001schuz/s001001.htm" corresp="#ps2013-001-01-000-000.audio1"/> <note type="speaker">Předsedající Miroslava Němcová</note> ... </div>

6.2. Utterances

A speech is marked up using the <u> (utterance) element, as illustrated below:

<u who="#DavidPrior" ana="#regular"> <seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg> <seg>The relevant document is the 20th Report from the Legislation Committee.</seg> </u>

The most important attribute of an utterance is who, which gives the pointer to the <person> element containing the metadata of the speaker, which is discussed in the Section on Speakers. Despite its importance in allowing analyses of speeches by speaker and their metadata, it does happen that the speaker of every speech cannot be determined; for such cases, the who attribute should be omitted.

The <u> element should also have the ana attribute giving a pointer to the typology of types of speakers, which is especially important to enable the distinction between the speeches of a session chair (who mostly speak on procedural matters) from regular, and, possibly, guest speakers. Note that we used the #regular values not only for MPs but for all other speakers that can regularly speak in a parliament, e.g. ministers, the MP, members of parlimentary commissions etc. There is also a special type of speaker, called #interrupting, which we discuss further in the Section on Interrupted utterances.

The utterances are then segmented using the <seg> element, which encodes the paragraphs of the source transcription. Even if the source files do not contain paragraph markings, each speech should contain at least one segment.

Finally, an utterance (just as a division) can also contain transcriber comments (notes), as further detailed in the next section.

6.3. Transcriber comments

Transcriber comments give information on who spoke, what the time was, interruptions and the reason for them, what is happening in the chamber, results of voting, etc. While section headings can also be taken as a kind of transcriber comments, these serve to structure the transcription and are encoded as <head> elements, as explained at the start of this chapter, cf. the Example there. Another type of transcribe comment treated separately is the presence of gaps in the transcript; these are treated in the Section on Gaps.

Apart heads and gaps, transcriber comments are encoded using the <note> element or one of several so called ‘incident’ elements, as explained below. These elements can be placed directly inside <div>, <u>, <seg> or even <s> in the linguistically annotated version. They should be placed as far up the hierarchy as possible, i.e. if they would appear at the start or end of a segment or utterance, to encode them before the start, or, respectively, after the end of this segment or utterance. If possible, it is especially conventient not to have them inside <seg> (which contains text), as placing these elements there leads to mixed content, which is more difficult to process further, in particular when linguistically annotating the corpus. Similary, it is also better to move them outside <s> elements. However, if a transcriber comments were placed in the middle of the text for good reasons then they can be encoded inside the segment or sentence. Note, however, that utterances can also be split on transcriber comments, as is explained in the Section on Interrupted utterances.

6.3.1. Notes

In general, transcriber comments are encoded using <note>, which can be further qualified via its type attribute. We do not currently specify what the valid values of this attribute are. Some comments can also be encoded using more precise TEI elements, as further explained below. The following example gives typical transcriber comments:

<note type="speaker">The president, Dr. Milan Brglez:</note> ... <note type="vote-ayes">84 voted for the adoption of the measure.</note> ... <note type="vote-noes">2 voted against the adoption of the measure.</note> ... <note type="time">The session began at 10 o'clock.</note> ...

The first note simply gives the speaker of the utterance that would follow it, the second and third are notes on the voting results, while the fourth gives the time when the session started. Note that in this case we can also explicitly add the time when the sessions started, as in the following example:

<note type="time">The session began at <time when="2016-04-13T010:00:00">10 o'clock</time>.</note>

Note that, in ParlaMint, the when attribute of <time> must contain not only the time, but the date as well, so that users of the corpus do not need to infer it.

6.3.2. Incidents

Some types of transcriber comments, which we term incidents can be encoded using more specific TEI elements. These elements can also be further qualified by the type attribute, with the values being determined by the ParlaMint schema. The three incident elements are:

<vocal> marks any vocalised but not necessarily lexical phenomenon, with the values of the (for this element) obligatory type attribute being, for example interruption, laughter, murmuring etc.
<kinesic> marks any communicative phenomenon, not necessarily vocalised, with the optional type values being e.g. applause, laughter, gesture.
<incident> marks any phenomenon or occurrence, not necessarily vocalised or communicative, with the optional type values being e.g. break, sound, action.

The example below illustrates the use of these three elements:

<vocal type="interruption"> <desc>sounds from the chamber</desc> </vocal> ... <kinesic type="signal"> <desc>signal for end of debate</desc> </kinesic> ... <incident type="action"> <desc>minute of silence</desc> </incident>

As the example shows, the original content of the transcriber comment is retained in the <desc> element. Note that in cases when the agent of an incident is known, they can be specified in the optional who attribute, just as on utterances.

While the incidents must have at least one <desc> element, they can also have several, so that the description of the incident can be, if so desired, translated into English, as illustrated in the following example:

<vocal type="interruption"> <desc xml:lang="sl">oglašanje z dvoraner</desc> <desc xml:lang="en">sounds from the chamber</desc> </vocal>

6.4. Gaps

The transcribers can also note that a part of the speech was not transcribed, typically because it was not understood, sometimes also noting the reason why, such as that the microphone was not turned on, that there was noise in the chamber, or that the speaker was speaking too quietly. These notes can be encoded as the <gap> element, which is then also marked by reason=inaudible. The original transcriber comment is left in the <desc> element, as illustrated below:

... I would further state that <gap reason="inaudible"> <desc>speaker spoke too quietly, not understood</desc> </gap> and furthermore ...

Another reason for omitting a part of the transcription can be an editorial decision of the corpus compilers. The transcript can, for example, contain material that they do not want to include in the corpus, such as tables, or parts of the transcription that for technical reasons cannot be converted to text. In these cases, the reason given should be editorial, while the <desc> should contain what has been omitted, as illustrated below.

<gap reason="editorial"> <desc xml:lang="en">Table omitted</desc> </gap>

Sometimes a passage of the transcription is in a foregin language, and, esp. as the corpus is to be linguistically annotated, the passage is best left out of the transcription proper. This can be achieved by encoding it as a gap in the transcription with the reason foreign, while the <desc> should contain the omitted text. In this case therefore the description does not give the reason for the ommission, but rather the text that has been ommited. The language of the foreign passage should be indicated on the xml:lang attribute of <desc>. If the language has not been identified, the ISO 639 code for undetermined language ‘und’ can be used, while in cases where more than one language is used in such a passage, the ‘mul’ code for multiple languages is used. All languages used on <desc> should of course be documented in the <langUsage> element. Below an example:

<gap reason="foreign"> <desc xml:lang="und">Huliniahuanngittunga</desc> </gap>

As with incidents, gaps can also contain several descriptions so that they can be, if so desired, translated:

<gap reason="editorial"> <desc xml:lang="de">Zitierte Druckfassung entfernt</desc> <desc xml:lang="en">Quoted printed matter omited</desc> </gap>

6.5. Interrupted utterances

A special case occurs when a transcription note states that somebody interrupted the speaker and gives the transcript of the interruption, possibly with who interrupted, with the main speaker then continuing with their speech, as in the following made up snippet:

Boris Johnson: I propose a no-deal Brexit. /Jeremy Corbyn: Traitor!/ Because England does not want any dealings with the European Union.⚓

The standard manner in which such interruptions are encoded is using the default <note> element, or, much better, the <vocal> element, as explained in the Section on Incidents, as below:

<u who="#BorisJohnson" ana="#regular"> <seg>I propose a no-deal Brexit. <vocal type="interruption"> <desc>Jeremy Corbyn: Traitor!</desc> </vocal> Because England does not want any dealings with the European Union.</seg> </u>

This solution is relatively easy to implement and valid in ParlaMint, however, it has the disadvantage of leaving what is essentially a speech as the content of a comment. In cases where it is possible to consistently identify such ‘mini speeches’, an alternative and more useful encoding will turn this comment into a separate speech, and split the main utterance into two (or more) pieces. The example below illustrates how this is encoded:

<u who="#BorisJohnson" ana="#regular" xml:id="GB001.8.3" next="#GB001.8.5">I propose a no-deal Brexit.</u> <u who="#JeremyCorbyn" ana="#regular #interrupting" xml:id="GB001.8.4">Traitor!</u> <u who="#BorisJohnson" ana="#regular" xml:id="GB001.8.5" prev="#GB001.8.3">Because England does not want any dealings with the European Union.</u>

As can be seen, the split is indicated by the use of the next attribute on the first part of the split utterance and by the prev attribute of the next part of the split utterance, while the fact that an utterance interrupts another one is signaled by the addition of #interrupting to the ana attribute. The values of the next and prev attributes are pointers to the next of previous identifiers of the appropriate part of the split utterance.⁴.

In the example the speaker of the interrupting speech has also been identified and marked in the who attribute; in cases where this is not possible, this attribute can be omitted. As mentioned this speaker should also have the value #interrupting in their ana attribute; this value comes from the appropriate category of the ParlaMint speaker type taxonomy. In case the speaker is identified, and their status can be determined, ana should also contain the type proper of the speaker, i.e. whether they are #chair, #regular or #guest speaker is assumed.

7. Linguistic annotation

This section introduces the ParlaMint linguistic annotation. An important note is that a linguistically annotated ParlaMint corpus is stored separately from its base (or plain-text) version, i.e. the version that has been discussed in the preceding sections. The encoding of the linguistically annotated version differs from the plain-text one in the following:

All the corpus root and components file names the extension .ana.xml. For example, if the plain-text root has the file name ParlaMint-CZ.xml, the linguistically annotated one should be ParlaMint-CZ.ana.xml, or ParlaMint-CZ_2016-04-13.xml and ParlaMint-CZ_2016-04-13.ana.xml
Because the file ID (i.e. the value of the top level element attribute xml:id, as explained in the Section on Attributes of top-level elements) should be the same as the file name (cf. the Section on File names and directory structure), the previous point also means that the linguistically annotated files should have the top level ID suffixed with .ana, e.g. <teiCorpus xml:id="ParlaMint-CZ.ana">
The corpus stamp in the main title of the corpus root or components (cf. the Section on Title statement) which is [ParlaMint] in the plain-text version, should be [ParlaMint.ana] for the linguistically annotated version.
All the plain text of the utterance segments (i.e. the text immediately contained by the <seg> elements) of the plain-text version should be linguistically annotated on the specified levels, as is further explained in the following Section on Linguistic markup.
The linguistically annotated version of the corpus should also have some added metadata in the TEI header of the corpus root, which is detailed in the following Section on Metadata for linguistic annotation.

7.1. Linguistic markup

Linguistic annotation is added only to the text content of <seg> elements inside the speeches, i.e <u> elements. For this text, ParlaMint requires the following additional markup to be present:

tokens: what is a word, and what is punctuation, with preserved information on inter-token spaces;
sentences: what is a sentence;
lemmas: what is the base form of each word;
Universal Dependencies (UD) part-of-speech and morphological features, and, optionally, part-of-speech tags from a different (local) tagset;
named entities (NE): what is a name, categorised at least into the standard four NE classes;
the UD dependency syntactic parse of the sentences;
USAS semantic annotations on words and phrases but only for the machine translated corpora.

Below, we explain the encoding of each of these levels.

7.1.1. Word-level annotation

Basic linguistic annotation comprises tokenisation, sentence segmentation, part-of-speech tagging and lemmatisation, and this mark-up is illustrated in the example below:

Sentences are marked up using the <s> element, words with the <w> element and punctuation symbols with the <pc> element. To retain the linguistically significant whitespace, the join element with the fixed value right is used, meaning there should be no whitespace to the right of the token. There can be (depending on the language and annotation tool used) an added complication with tokenisation, which is further taken up in the next Section on Syntactic words.

The base form or lemmas of a word is given as the value of the lemma attribute, while punctuation characters, <pc>, do not have this attribute.

The UD part-of-speech and morphological features are both packed in the msd attribute, with the part-of-speech having the UPosTag linguistic attribute, and the features separated by the vertical bar.

ParlaMint also allows (but does not require) part-of-speech tags from some other tagset⁵ to be added to the linguistic annotation. Where this information is encoded, depends on the type of tagset.

For synthetic tagsets, such as the Penn Treebank tagset, which have atomic tags that cannot always be decomposed into attribute-value pairs (e.g. the tag ‘TO’ for the word ‘to’) should be encoded using the pos on words and punctuation symbols, as shown in the example below:

For analytic tagsets, where a part-of-speech tag can be always decomposed into a set of attribute-values, the pointing attribute ana should be used. An example of such a collection of tagsets for various languages is given in the MULTEXT-East morphosyntactic specifications, and we give below an example that uses this tagset:

The mte: is a prefix that is, via the TEI extended pointer syntax as defined in the TEI header (cf. the Section on Prefix definitions) expanded so that the value of such an ana attribute points to the expansions of the given tag to a feature structure. For example, the value mte:Vmpr1p would be expanded to https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p, which then resolves to the feature-structure below:

7.1.2. Syntactic words

Certain frameworks, in particular the UD one (cf. their information on Tokenization and Word Segmentation and on Words, Tokens and Empty Nodes), allow for tokens to be decomposed into several words, and it is these syntactic words, and not tokens, that are further annotated.

To allow for such mismatches between word tokens and syntactic words, we use embedded empty words with associated norm attributes and the standard attributes with linguistic annotation. For example, Czech has the word ‘abyste’ which is in UD decomposed into two syntactic words, ‘aby’ and ‘byste’. This should be encoded as in the following example⁶:

<w>abyste <w norm="aby" lemma="aby" msd="UPosTag=SCONJ"/> <w norm="byste" lemma="být" msd="UPosTag=AUX|Mood=Cnd|Number=Plur|Person=2|VerbForm=Fin"/> </w>

Note also that if such a multi-word token does not have a space following it, join="right" should be added to the top level word.

While we do not have examples of such a practice yet, there could also be cases where two (or more) tokens correspond to one syntactic word. In such cases, it is the syntactic word that is on the top level, while the inner words are the actual tokens. To take an example from historical language, Slovene used to form the superlative form of adjectives with the word ‘naj’ written separately (and often as ‘nar’), while in contemporary Slovene, the ‘naj’ is a prefix of the adjective. This would be encoded as follows:

<w norm="najlepši" lemma="lep"> <w>nar</w> <w>lepši</w> </w>

In this case, if such a multi-token syntactic word would not have a space following it, join="right" should be added to the last token, i.e. ‘lepši’.

7.1.3. Named entities

ParlaMint also requires annotation of Named Entities (NE), which should be categorised into the following four types:

PER: person
LOC: location
ORG: organisation
MISC: miscellaneous

These types are also specified in a specialised taxonomy, as further explained in the Section on Linguistic taxonomies.

The identified names and their type are marked up as the <name> element with the appropriate value of its type attribute, as shown in the example below:

... <w lemma="and" msd="UPosTag=CCONJ">and</w> <name type="ORG"> <w lemma="Westminster" msd="UPosTag=PROPN|Number=Sing">Westminster</w> <w join="right" lemma="Hall" msd="UPosTag=PROPN|Number=Sing">Hall</w> </name> <w lemma="," msd="UPosTag=PUNCT">,</w> ...

ParlaMint also supports more complex NE annotation schemes, such as the one used for Czech data, which introduces very detailed NE types and also allows for nested named entities. The example below gives such a case, where the top level and ParlaMint-compatible person name also contains nested names:

<name type="PER" ana="ne:p"> <name ana="ne:pf"> <w>Františka</w> </name> <name ana="ne:ps"> <w>Laudáta</w> </name> </name>

Here, the language specific NE annotations are given as the value of the ana attribute, which are pointers using the TEI extended pointer syntax (cf. the Section on Prefix definitions) into a corpus-specific taxonomy that defines these local NE types (for how ParlaMint linguistic taxonomies are defined, cf. the Section on Linguistic taxonomies).

7.1.4. Syntactic parses

Sentences are accompanied by a Universal Dependencies parse. These analyses are encoded inside their sentence mark-up, although in a stand-off manner. This means that each token must be given an ID, while the syntactic analysis is stored in the link group, <linkGrp> element containing a series of <link> elements. Each is labeled by a dependency label and joins two tokens. The example below illustrates the syntactic encoding:

<s xml:id="ParlaMint-GB_2021-01-06.seg393.8"> <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.1">I</w> <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.2">support</w> <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.3">the</w> <w join="right" xml:id="ParlaMint-GB_2021-01-06.seg393.8.4">amendment</w> <pc xml:id="ParlaMint-GB_2021-01-06.seg393.8.5">.</pc> <linkGrp targFunc="head argument" type="UD-SYN"> <link ana="ud-syn:nsubj" target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.1"/> <link ana="ud-syn:root" target="#ParlaMint-GB_2021-01-06.seg393.8 #ParlaMint-GB_2021-01-06.seg393.8.2"/> <link ana="ud-syn:det" target="#ParlaMint-GB_2021-01-06.seg393.8.4 #ParlaMint-GB_2021-01-06.seg393.8.3"/> <link ana="ud-syn:obj" target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.4"/> <link ana="ud-syn:punct" target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.5"/> </linkGrp> </s>

The example shows that each token, as well as the sentence element should be given an xml:id attribute, that the link group comes at the end (but inside) the sentence, that <linkGrp> has two attributes targFunc and type, both with the fixed values of head argument and UD-SYN, and that it contains a series of empty <link> elements.

The link elements then give, via the value of their target attribute, references the head and argument tokens of the syntactic relation, which is specified in the ana attribute. By convention, the links are ordered so that the argument references follow the ordering of the tokens in the sentence, i.e. all the tokens in the sentence should appear in order in the second position. Note that for the top level root relation (of which there should be only one in the sentence), the head is the reference to the sentence ID.

The relations themselves are pointers which use the ud: prefix that is, via the TEI extended pointer syntax as defined in the TEI header (cf. the Section on Prefix definitions) expanded so that the value of such an ana attribute points to the categories of the special UD syntactic taxonomy which must be a part of the linguistically annotated version of the corpus; how to insert this taxonomy is specified in the Section on Linguistic taxonomies. There is one more detail to watch out for, namely, that UD allows the colon symbol : to appear in extended relations, e.g. acl:relcl for relative clause modifier. As we already use the colon for the extended pointer prefix, the colons in the relations should be changed to underscore, e.g. to ud-syn:acl_relcl. Note, however, that the relations specified in a <link> ana attribute are just pointers, and could have any value; it is the UD taxonomy that actually determines the correct value of the relation.

7.1.5. Semantic annotation

The machine translated ParlaMint corpora (cf. the Section on Translations of corpora) have one more level of annotation, namely the semantic annotation of tokens and phrases (i.e. Multi-Word Expressions or MWEs) with USAS semantic tags. First, in order to be able to semantically tag MWEs, a new element was introduced inside sentences, namely phrase, <phr>⁷. The USAS semantic tags are then encoded in two attributes of tokens or MWEs, namely function and ana, as illustated in the example below:

<w function="Z5" ana="sem:Z5">the</w> <w function="G1.1/S2mf,S9/S2mf" ana="sem:G1.1 sem:S2">Minister</w> <w function="Z5" ana="sem:Z5">of</w> <name type="ORG"> <w function="I1" ana="sem:I1">Finance</w> </name> <w function="Z5" ana="sem:Z5">and</w> <phr type="sem" function="Z1mf,Z3c" ana="sem:Z1"> <w function="Z1mf,Z3c" ana="sem:Z1">Deputy</w> <w lemma="Prime" function="Z1mf,Z3c" ana="sem:Z1">Prime</w> <w lemma="Minister" function="Z1mf,Z3c" ana="sem:Z1">Minister</w> </phr>

The first thing to note is that all the tokens inside a MWE receive identical semantic markup as its encompassing MWE <phr> element. Second, the function attribute gives the USAS tags exactly as output by the tool used for semantic tagging, which includes not only the tag computed to be the most appropriate for the given context, but also all the other lexically possible tags, with the comma being the separator. The first tag is then used to compute the values of the ana attribute. Here the conjunctive USAS tag (here the delimiter is the slash) is transformed into a series of references to the USAS taxomomy (cf. the Section on Linguistic taxonomies) while also removing tag qualifiers. For example, the tag G1.1/S2mf is transformed into sem:G1.1 sem:S2 i.e. the mf (for male and female) qualifiers are removed from S2mf. Note also that the prefix sem (cf. the Section on Prefix definitions) is used for pointing into the USAS taxonomy. With this set-up it is possible to encode the exact USAS tags as well as their ParlaMint categories, which give the not only the tag, but also its gloss.

7.2. Metadata for linguistic annotation

What kind of metadata a plain-text ParlaMint corpus should contain was explained in the Section on Corpus metadata and in this section we detail what additions must be made to the metadata for the linguistically annotated version. Note that the changes for this version have been already explained at the start of this Chapter. In short, there are three additional parts that should be added to the <teiHeader> of the corpus root, namely a description of the tool(s) used to linguistically annotate the corpus, two additional taxonomies (one for named entities, and one for UD syntactic relations) and the definition of the prefix expansions for UD syntactic relations. These descriptions should also serve as the point of departure for those that want to introduce their own prefixes and taxonomies for defining additional and corpus-specific part-of-speech tagging schemes or named entity classes.

7.2.1. Application information for linguistic processing

As the linguistic analysis of a ParlaMint will be performed by a tool, the information on which tool (or tools) have been used should be documented in the corpus root TEI header. This information is encoded in the <appInfo> element of the <encodingDesc>, as shown in the example below:

<appInfo> <application version="1.0" ident="classla"> <label>CLASSLA</label> <desc xml:lang="en">Linguistic processing performed with with CLASSLA trained for Slovene, available from <ref target="https://github.com/clarinsi/classla">https://github.com/clarinsi/classla</ref>.</desc> </application> </appInfo>

The <appInfo> element contains, in general, a series of <application> elements, each one giving the information on one tool. The element gives the version number of the tool and specifies, via ident, and identifying code. It has two subordinate elements, with <label> giving the name of the tool and <desc> a short description of it, preferably with a pointer to the URL where it can be found or is at least documented.

7.2.2. Linguistic taxonomies

Some linguistic annotations have fixed vocabularies and these should be encoded as taxonomies in the TEI header of the linguistically analysed corpus root, similarly to other taxonomies, as discussed in the Section on the Class declaration.

The first taxonomy is the Named Entity types, which has - apart from translating the categories into the local language - a fixed structure, as follows:

<taxonomy xml:id="ParlaMint-taxonomy-NER.ana"> <desc xml:lang="en"> <term>Named entities</term> </desc> <category xml:id="PER"> <catDesc xml:lang="sl"> <term>oseba</term> </catDesc> <catDesc xml:lang="en"> <term>person</term> </catDesc> </category> <category xml:id="LOC"> <catDesc xml:lang="sl"> <term>lokacija</term> </catDesc> <catDesc xml:lang="en"> <term>location</term> </catDesc> </category> <category xml:id="ORG"> <catDesc xml:lang="sl"> <term>organizacija</term> </catDesc> <catDesc xml:lang="en"> <term>organisation</term> </catDesc> </category> <category xml:id="MISC"> <catDesc xml:lang="sl"> <term>drugo</term> </catDesc> <catDesc xml:lang="en"> <term>miscellaneous</term> </catDesc> </category> </taxonomy>

The second taxonomy to be inserted is the one for Universal Dependency relations. We currently do not use corpus specific taxonomies, even though different languages use different subsets of the UD syntactic relations, but rather a common taxonomy giving all the UD relations; the taxonomy has currently also not been localised, i.e. it is available in the English language only. Below we illustrate by giving a few relation definitions:

<taxonomy xml:id="ParlaMint-taxonomy-UD-SYN.ana"> <desc xml:lang="en"> <term>UD syntactic relations</term> </desc> <category xml:id="acl"> <catDesc xml:lang="en"> <term>acl</term>: Clausal modifier of noun (adjectival clause)</catDesc> </category> <category xml:id="cc_preconj"> <catDesc xml:lang="en"> <term>cc:preconj</term>: Preconjunct</catDesc> </category> <category xml:id="dep"> <catDesc xml:lang="en"> <term>dep</term>: Unspecified dependency</catDesc> </category> <category xml:id="punct"> <catDesc xml:lang="en"> <term>punct</term>: Punctuation</catDesc> </category> <category xml:id="root"> <catDesc xml:lang="en"> <term>root</term>: Root</catDesc> </category> </taxonomy>

The ID and description, <desc> of the <taxonomy> are fixed, and the <category> elements have the usual structure. Note that the ID of a category is identical to its name given in <term>, except that the colon, : in the official name of the relation must be substituted by the underscore, _, to enable correct referencing of these IDs, as discussed in the Section on Syntactic parses.

The third taxonomy gives the sentiment classes with which version 5 of the ParlaMint corpora have been annotated. The sentiment annotation is given as a real number, however, for easier usage of sentiment information the numbers have been converted to discrete classes, and the sentiment taxonomy gives the main three classes (negative, neutral and positive sentiment) as well as six subordinate classes (e.g. "mixed negative"), as well as detailing the number intervales corresponding to each class.

<taxonomy xml:id="ParlaMint-taxonomy-sentiment.ana" xml:lang="mul"> <desc xml:lang="en"> <term>Sentiment</term>: 3 and 6 class sentiment labels following ...</desc> <category xml:id="Neg"> <catDesc xml:lang="en"> <term>Negative</term>: value < 1.5</catDesc> <category xml:id="negneg"> <catDesc xml:lang="en"> <term>negative</term>: value < 0.5</catDesc> </category> <category xml:id="mixneg"> <catDesc xml:lang="en"> <term>mixed negative</term>: Mixed negative, interval [0.5, 1.5)</catDesc> </category> </category> ... </taxonomy>

The fourth taxonomy is only relevant for the machine translated corpora (cf. the Section on Translations of corpora) and gives the categories of the USAS semantic tags (for their encoding in the corpora cf. the Section on Semantic annotation). This taxonomy is only available in the English language and is identical for all the machine translated corpora. Below we illustrate its structure by giving the start of the taxonomy:

<taxonomy xml:id="ParlaMint-taxonomy-USAS.ana" xml:lang="en"> <desc xml:lang="en"> <term>USAS categories</term>: Semantic categories following the USAS Semantic tagset, ...</desc> <category xml:id="A1"> <catDesc> <term>A1</term>: General And Abstract Terms</catDesc> <category xml:id="A1.1.1"> <catDesc> <term>A1.1.1</term>: General actions / making</catDesc> <category xml:id="A1.1.1n"> <catDesc> <term>A1.1.1-</term>: Inaction</catDesc> </category> </category> <category xml:id="A1.1.2"> <catDesc> <term>A1.1.2</term>: Damaging and destroying</catDesc> <category xml:id="A1.1.2n"> <catDesc> <term>A1.1.2-</term>: Fixing and mending</catDesc> </category> </category> ... </category> ... </taxonomy>

The current USAS taxonomy covers a subset of all USAS semantic tags and was derived from the official list of USAS semantic subcategories. A category in the taxonomy can contain up to one positive (USAS = '+', taxonomy = 'p') or negative (USAS = '-', taxonomy = 'n') modifier. Other USAS modifiers (i.e. regex [mfnci%@]) are not retained. The taxonomy includes 455 categories, each with its USAS code and gloss.

7.2.3. Prefix definitions

Pointing attributes, such as ana, take as their value a series of references to the value of xml:id elements in an XML document. If this is the same document, then the reference to the ID is the hash character, # prefixed to the particular ID, e.g. #parla.uni, and if they are in another XML document, then the hash is prefixed with the URL of the document, e.g. https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p.

Because the complete URL tends to be long, which is especially inconvenient when such references are given to every token in a corpus, TEI introduces the so called Extended pointer syntax, whereby the reference to an ID can be given in the form of a prefix, which is separated by a colon from the local part of the ID reference, and the value of this prefix is determined via the <prefixDef> element in the <profileDesc> of the TEI header.

ParlaMint uses this mechanism for all linguistic annotations with a closed vocabulary, in particular for the Universal Dependencies syntactic relations, for the optional and corpus-specific analytical part-of-speech tags (c.f. the Sections on Syntactic parses and Word-level annotation), and for semantic annotation in the machine translated corpora (c.f. the Section on Semantic annotation). The example below illustrates the prefix definitions for the obligatory UD syntactic relations and for the optional MULTEXT-East tags:

<listPrefixDef> <prefixDef ident="ud-syn" matchPattern="(.+)" replacementPattern="#$1"> <p xml:lang="en">Private URIs with this prefix point to elements giving their name. In this document they are simply local references into the UD-SYN taxonomy categories in the corpus root TEI header.</p> </prefixDef> <prefixDef ident="mte" matchPattern="(.+)" replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-sl.xml#$1"> <p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Slovenian MULTEXT-East Version 6 MSDs.</p> </prefixDef> </listPrefixDef>

The specialised element for listing prefix definitions, <listPrefixDef> gives a series of prefix definitions, i.e. <prefixDef> elements. Each prefix definition defines its prefix as the value of the ident attribute, and then specifies a regular expression that matches the part of the ID reference after the prefix in its matchPattern attribute, and its substitution as the value of the replacementPattern attribute. The first prefix definition thus defines the ud-syn prefix, so for any ID reference with this prefix, e.g. ud-syn:acl_relcl, the part after the prefix (acl_relcl) should be matched against (.+) and the result being the matched part (here the entire relation acl_relcl) substituted by #$1, i.e. by the hash character followed by the original value, so that ud-syn:acl_relcl gives #acl_relcl. This substitution is of course trivial, and hardly necessary, but was implemented so that all fixed-vocabulary linguistic analyses have the same treatment.

More to the point is the second example, where very short ID references, such as mte:Vmpr1p are transformed to https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p, as already explained in the Section on Word-level annotation.

Finally, each prefix definition also contains a possibly bi-lingual paragraph explaining the definition.

8. Translations of corpora

The ParlaMint machine translated corpora are encoded simliarly to corpora in their source language, i.e. they have an identically structured corpus root and components. The most obvious differences are the following:

The filenames are extended with the target language code, as described in the Section on Filenames, e.g. the linguistically analysed English translation of the Latvian corpus would have the root filename ParlaMint-LV-en.ana.xml and a component filename could be ParlaMint-LV_2014-11-04.ana.xml.
The top-level xml:id of the corpus root and of components must also be changed accordingly to e.g. ParlaMint-LV-en.ana or ParlaMint-LV_2014-11-04.ana.
The stamp in the main titles should be changed too by adding the target langauge suffix, e.g. to [ParlaMint-en.ana].
The top level xml:lang in the root and components should be set to the target language, e.g. xml:id="en".

The structure of the <text>, inluding the transcriber comments and the linguistic analysis is encoded the same as as for the corpora in the source language. The two differences are that the aligned elements are linked to their corresponding elements in the source corpus, and, to simplify processing, the transcriber comments are moved outside sentences in the (rare) cases where they appeared inside them in the original language corpus. In the current machine translated corpora the alignment are given to utterances, segments, sentences, and transcriber comments, which have, furthermore, always a 1-1 mapping to the corresponding source element. Therefore the alignment is trivial, simply specifiying the same xml:id value of the element the source corpus, as illustrated in the following example:

<div type="debateSection" xml:lang="en"> <note type="speaker" xml:id="ParlaMint-LV_2019-01-31-PT13-516.ana.note1" corresp="mt-src:ParlaMint-LV_2019-01-31-PT13-516.ana.note1">Head of the sitting.</note> <u who="#ĀboltiņaSolvita" xml:id="ParlaMint-LV_2014-11.u1" ana="#chair" xml:lang="en" corresp="mt-src:ParlaMint-LV_2014-11.u1"> <seg xml:id="ParlaMint-LV_2014-11-04.seg1" xml:lang="en" corresp="mt-src:ParlaMint-LV_2014-11-04.seg1"> <s xml:id="ParlaMint-LV_2014-11-04.s1" corresp="mt-src:ParlaMint-LV_2014-11-04.s1"> <w xml:id="ParlaMint-LV_2014-11-04.s1.t1" msd="UPosTag=PROPN|Number=Sing" lemma="Mr.">Mr.</w> <w xml:id="ParlaMint-LV_2014-11-04.s1.t2" msd="UPosTag=PROPN|Number=Sing" lemma="President" join="right">President</w> ... </s> ... </seg> ... </u> ... </div>

As can be seen above, the alignment is specified on the corresp attibute. The alignment reference makes use of the TEI extended pointer syntax (cf. also the Section on Prefix definitions), to define the mt-src prefix which must resolve to the correct component file of the corpus in the source language.

In contrast to other linguistic annotations that specify one prefix definition for the complete corpus (so, inside the corpus root <teiHeader>), the translated corpora should specify a prefix definition inside each corpora component because its definition depends on the component file. For example, the prefix definition of the corpus component file ParlaMint-LV-en_2014-11-04.ana.xml should be as in the following example:

<listPrefixDef> <prefixDef ident="mt-src" matchPattern="(.+)" replacementPattern="../../ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014-11-04.ana.xml#$1"> <p>Private URIs with this prefix point to aligned source elements of the MTed corpus.</p> </prefixDef> </listPrefixDef>

The assumption above is that the two corpora are available in the same directory (so, ./ParlaMint-LV.TEI.ana/ and ./ParlaMint-LV-en.TEI.ana/), so that corresp values of the file ParlaMint-LV-en.TEI.ana/2014/ParlaMint-LV-en_2014-11-04.ana.xml with the mt-src prefix will point to ParlaMint-LV-en.TEI.ana/2014/../../ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014.ana.xml i.e. to ./ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014-11-04.xml.

The final additional element of the transated corpora is the information on the application, <application>, (cf. also the Section on Application information for linguistic processing) i.e. on the program that was used to translate the corpora, as illustrated by the following example:

<application ident="EasyNMT" version="2.0"> <label>EasyNMT (OPUS-MT model)</label> <desc>Translation to English done with EasyNMT (<ref target="https://github.com/UKPLab/EasyNMT">https://github.com/UKPLab/EasyNMT</ref>) with OPUS-MT model bat (<ref target="https://github.com/Helsinki-NLP/Opus-MT">https://github.com/Helsinki-NLP/Opus-MT</ref>)</desc> </application>

This element should be given in the corpus root, together with all the other information on applications inside the application information (<appInfo>) element.

9. Validation and conversion

The chapter explains how to validate and finalise a ParlaMint corpus, and introduces scripts for converting a ParlaMint corpus to other, derived formats.

9.1. Validating ParlaMint corpora

The XML structure of ParlaMint corpora can be validated via RelaxNG schemas, which exist in two versions, one that was produced as a customisation of the TEI Guidelines, and a set of schemas that were made from scratch for ParlaMint.

The TEI customisation is written as a TEI ODD document, which is, in fact, the XML version of this document, and is available in the TEI/ directory of the ParlaMint GitHub repository. The XML contains not only the prose guidelines, but also the formal specification of the TEI schema, which is given in the Appendix A. In the XML it contains the formal schema specification, while in the on-line version this is converted to a reference to all the elements, attributes and classes used in ParlaMint corpora. The ODD document is not immediately useful for XML validation, but has to be converted with TEI XSLT stylesheets first in order to obtain a RelaxNG schema, and this schema is also available in the same directory under the name of ParlaMint.rng (in RelaxNG XML syntax) and ParlaMint.rnc (in RelaxNG compact syntax). This schema should be used to check that ParlaMint component files validate against TEI.

However, it is difficult to constrain a TEI ODD-derived XML schema to allow only the kinds of nestings and attributes that should appear in a ParlaMint corpus, so this schema allows (and lists Appendix A) nesting of elements, as well as attributes that are in fact forbidden in ParlaMint corpora.

For this reason, we have also developed a set of RelaxNG schemas from scratch, which do allow only those elements, attributes and content models that are in fact valid for a ParlaMint corpus. There are all together four such schemas, one for a "plain-text" corpus root, one for its corpus components, one for the linguistically annotated corpus root, and one for its components. These schemas can be found in the Schema/ directory of the ParlaMint GitHub repository, with the README file giving instructions on how to use them.

Validating with XML schemas checks the formal structure of XML files but is less successful in validating other aspects of conformance, such as the textual content or linking of pointer attributes. For this reason, we have also developed an XSLT script that assumes a schema-validated ParlaMint file on its input, and checks various other aspects of conformance. These validation scripts can be found in the Scripts/ directory of the ParlaMint GitHub repository, with the README file listing them.

It should be noted that it is not necessary to run the validation scripts directly, as the validation can be performed by the main Makefile of the project. The Makefile is self-documenting, i.e. to see how to use it, please run make help in the top level directory of the ParlaMint project.

While each contributor of a corpus should validate their files with the ParlaMint schemas and validation script, there also exist further stages of validation, which are also applied to ParlaMint corpora:

The corpora are converted to derived formats, in particular, the linguistically annotated version of the corpus to CoNLL-U and to the so called vertical format for CQP-type concordancers. The Universal Dependencies project provides a program for validating the formatting and linguistic analyses in CoNLL-U files, and this validation is used on the CoNLL-U files derived from their XML source, up to level 2 conformance. The vertical files, on the other hand, are first compiled with manatee (the back end of (no)Sketch Engine) and this compilation can also expose various errors.
The last stage in validation is ‘human validation’ where e.g. simply looking at various produced metadata files or at the concordances of a corpus exposes errors.

9.2. Finalisation of corpora

While the vast majority of converting source encodings into the ParlaMint corpus format is left to the compilers of a corpus, there are a few metadata elements that can be produced by a common script on the basis of nearly finished corpora, which then results in the final version of the corpus for a particular release. This includes setting the date, edition and handle under which the corpus will be distributed, and also calculating the size of the corpus (cf. the Sections on Extents and on Tags declaration). The script for finalisation can be found in the Scripts/ directory of the ParlaMint GitHub repository and the README file briefly explains its function; more comments can be found in the script itself.

9.3. Conversions

A TEI encoded document is, in general, not meant to be used directly by software programs, rather, it serves as an interchange and storage format. The ParlaMint project has produced various scripts to down-convert the XML encoded corpora to other formats and they can be found in the Scripts/ directory of the ParlaMint GitHub repository, with the README file listing them and explaining their function. In short, the scripts convert the ParlaMint XML to plain text, to CoNLL-U, and to vertical format. There is also a script that takes a ParlaMint corpus and makes from it a sample for inclusion to the ParlaMint GitHub repository.

10. Contributing to ParlaMint

The ParlaMint GitHub repository contains these guidelines, the ParlaMint XML schemas, the scripts used to validate, finalise and convert the ParlaMint TEI XML corpora to derived formats, and samples of the ParlaMint corpora. There are four main branches in the repository:

main is the default branch used for the synchronisation of other branches. It is also used for releasing sample files that correspond to published corpora.
data serves as a pushing place for new sample files in ./Data/ParlaMint-XX directories.
devel: development of scripts and documentation.

The validation procedure for corpora is explained in the Section on Validating ParlaMint corpora, while the technical aspects of contributing corpora is further explained in the CONTRIBUTING file of the repository.

11. Acknowledgements

The work on these recommendations was funded by the CLARIN Research Infrastructure for Language Resources and Tools.

Appendix A Formal specification

Appendix A.1 Elements

Appendix A.1.1 <TEI>

<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure 16.1. Varieties of Composite Text]

Module

textstructure — Formal specification

Attributes

att.global.linking
- synch
- next
- prev
- @corresp

xml:id

Status	Required
Datatype	ID

xml:lang

Status	Required
Datatype	teidata.language

ana

Status	Required
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Contained by

core: teiCorpus

May contain

header: teiHeader

textstructure: text

Note

As with all elements in the TEI scheme (except <egXML>) this element is in the TEI namespace (see 5.7.2. Namespaces). Thus, when it is used as the outermost element of a TEI document, it is necessary to specify the TEI namespace on it. This is customarily achieved by including http://www.tei-c.org/ns/1.0 as the value of the XML namespace declaration (xmlns), without indicating a prefix, and then not using a prefix on TEI elements in the rest of the document. For example: <TEI version="4.8.1" xml:lang="it" xmlns="http://www.tei-c.org/ns/1.0">.

Example

Example of ParlaMint corpus component:

Content model

<content>
 <elementRef key="teiHeader"/>
 <elementRef key="text"/>
</content>
    ⚓

Schema Declaration

element TEI
{
   tei_att.global.linking.attribute.corresp,
   attribute xml:id { text },
   attribute xml:lang { text },
   attribute ana { list { + } },
   tei_teiHeader,
   tei_text
}⚓

Appendix A.1.2 <addName>

<addName> (additional name) contains an additional name component, such as a nickname, epithet, or alias, or any other descriptive phrase used within a personal name. [14.2.1. Personal Names]
Module	namesdates — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Member of	model.persNamePart
Contained by	namesdates: persName
May contain	Character data only
Example	<persName> <surname>Möderndorfer</surname> <forename>Jani</forename> <addName>Janko</addName> </persName>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element addName { tei_att.global.attribute.xmllang, text }⚓

Appendix A.1.3 <affiliation>

<affiliation> (affiliation) optionally contains the role name corresponding to the affiliation, the name of the organisation that the person is affiliated with, and notes giving further informal information about the affiliation. [16.2.2. The Participant Description]

Module

namesdates — Formal specification

Attributes

att.global.analytic
- @ana
att.global.source
- @source
att.datable.w3c
- notBefore
- notAfter
- @when
- @from
- @to
att.canonical
- key
- @ref

role

Status	Required
Legal values are:	head minister member academician alternateOfDelegation associateMember candidateChairman constitutionalJudge deputyHead deputyMinister ministerDelegate nonAttachedMember observer ombudsman prosecutorGeneral publicDefenderOfRights replacement representative secretary secretaryGeneral secretaryOfState verifier vicePublicDefenderOfRights

Member of

model.addressLike

Contained by

namesdates: person

May contain

core: note

namesdates: orgName roleName

Note

If included, the name of an organization may be tagged using either the <name> element as above, or the more specific <orgName> element.

Example

<person xml:id="AdamKalous.1979"> <persName> <surname>Kalous</surname> <forename>Adam</forename> </persName> <sex value="M"/> <birth when="1979-10-06"/> <idno type="URI">https://www.psp.cz/sqw/detail.sqw?id=6497</idno> <affiliation ref="#subcommittee.PEFPS.1414" role="head" from="2018-03-14T00:00:00" to="2021-10-21T00:00:00"> <roleName xml:lang="en">Chair Person</roleName> </affiliation> <affiliation ref="#subcommittee.PEFPS.1414" role="member" from="2018-03-14T00:00:00" to="2021-10-21T00:00:00"> <roleName xml:lang="en">Member</roleName> </affiliation> <affiliation ref="#committee.VSR.1315" role="deputyHead" from="2017-12-06T16:00:00" to="2021-10-21T00:00:00"> <roleName xml:lang="en">Vice Chairman</roleName> </affiliation> <affiliation ref="#committee.VSR.1315" role="member" from="2017-11-28T16:00:00" to="2021-10-21T00:00:00"> <roleName xml:lang="en">Member</roleName> </affiliation> <affiliation ref="#parliamentaryGroup.ANO.1292" role="member" from="2017-10-24T00:00:00" to="2021-10-21T00:00:00"> <roleName xml:lang="en">Member</roleName> </affiliation> <affiliation ref="#politicalParty.ANO2011.1104" role="representative" from="2017-10-21" to="2021-10-21"> <roleName xml:lang="en">Candidate MP</roleName> </affiliation> <affiliation ref="#parliament" ana="#parliament.PSP8" role="member" from="2017-10-21T14:00:00" to="2021-10-21T00:00:00"> <roleName xml:lang="en">MP</roleName> </affiliation> </person>

Example

<p>The affiliation element can also include an <att>ana</att> attribute, which points to the appropriate legislative period when the person was affiliated with the specified organisation:</p> <person xml:id="BahŽibertAnja"> <persName> <surname>Bah</surname> <surname>Žibert</surname> <forename>Anja</forename> </persName> <sex value="F"/> <affiliation role="member" ref="#DZ" from="2014-08-01" to="2018-06-21" ana="#DZ.7"> <roleName xml:lang="en">MP</roleName> </affiliation> <affiliation role="member" ref="#party.SDS.2" from="2014-08-01" to="2018-06-21" ana="#DZ.7"> <roleName xml:lang="en">Member</roleName> </affiliation> <affiliation role="member" ref="#DZ" from="2018-06-22" ana="#DZ.8"> <roleName xml:lang="en">MP</roleName> </affiliation> </person>

Content model

<content>
 <elementRef key="roleName" minOccurs="0"
  maxOccurs="unbounded"/>
 <elementRef key="orgName" minOccurs="0"
  maxOccurs="unbounded"/>
 <elementRef key="note" minOccurs="0"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element affiliation
{
   tei_att.global.analytic.attribute.ana,
   tei_att.global.source.attribute.source,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   tei_att.canonical.attribute.ref,
   attribute role
   {
      "head"
    | "minister"
    | "member"
    | "academician"
    | "alternateOfDelegation"
    | "associateMember"
    | "candidateChairman"
    | "constitutionalJudge"
    | "deputyHead"
    | "deputyMinister"
    | "ministerDelegate"
    | "nonAttachedMember"
    | "observer"
    | "ombudsman"
    | "prosecutorGeneral"
    | "publicDefenderOfRights"
    | "replacement"
    | "representative"
    | "secretary"
    | "secretaryGeneral"
    | "secretaryOfState"
    | "verifier"
    | "vicePublicDefenderOfRights"
   },
   tei_roleName*,
   tei_orgName*,
   tei_note*
}⚓

Appendix A.1.4 <appInfo>

<appInfo> (application information) records information about an application which has edited the TEI file. [2.3.11. The Application Information Element]
Module	header — Formal specification
Contained by	header: encodingDesc
May contain	header: application
Example	<appInfo> <application version="4.0" ident="stanford-corenlp"> <label>Stanford CoreNLP</label> <desc>Tokenisation, POS tagging, NER and dependency parsed using Stanford CoreNLP <ref target="https://stanfordnlp.github.io/CoreNLP/">https://stanfordnlp.github.io/CoreNLP/</ref>.</desc> </application> </appInfo>
Example	<appInfo> <application version="1.0" ident="reldi-tokeniser"> <label>ReLDI tokeniser</label> </application> <application version="1.0" ident="classla-stanfordnlp"> <label>CLASSLA-StanfordNLP</label> </application> <application version="1.0" ident="janes-ner"> <label>NER system for South Slavic languages</label> </application> </appInfo>
Content model	<content> <elementRef key="application" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element appInfo { tei_application+ }⚓

Appendix A.1.5 <application>

<application> provides information about an application which has acted upon the document. [2.3.11. The Application Information Element]

Module

header — Formal specification

Attributes

ident

supplies an identifier for the application, independent of its version number or display name.

Status	Required
Datatype	teidata.name

version

supplies a version number for the application, independent of its identifier or display name.

Status	Required
Datatype	teidata.versionNumber

Contained by

header: appInfo

May contain

core: desc label

Example

<appInfo> <application version="1" ident="app-stanza"> <label>Stanza</label> <desc xml:lang="en"> <ref target="https://stanfordnlp.github.io/stanza/index.html">Stanza</ref>: a jointly trained neural tagger, lemmatizer and dependency parser. Pretrained model based on the italian-isdt-ud-2.5 treebank</desc> </application> <application version="1" ident="app-t2k"> <label>T2K</label> <desc xml:lang="en"> <ref target="http://www.italianlp.it/demo/t2k-text-to-knowledge/">T2K</ref>: contains a named entity recognition module for Italian.</desc> </application> <application version="1" ident="conll-U2TEIXML"> <label>CoNLL-U 2 TEI XML</label> <desc xml:lang="en"> <ref target="http://conllu2teixml">CoNLL-U 2 TEI XML</ref>: converter from CoNLL-U format to (ParlaClarin/ParlaMint) Tei XML Format</desc> </application> </appInfo>

Example

<appInfo> <application version="4.0" ident="stanford-corenlp"> <label>Stanford CoreNLP</label> <desc>Tokenisation, POS tagging, NER and dependency parsed using Stanford CoreNLP <ref target="https://stanfordnlp.github.io/CoreNLP/">https://stanfordnlp.github.io/CoreNLP/</ref>.</desc> </application> </appInfo>

Content model

<content>
 <elementRef key="label"/>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element application
{
   attribute ident { text },
   attribute version { text },
   tei_label,
   tei_desc+
}⚓

Appendix A.1.6 <availability>

<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.]

Module

header — Formal specification

Attributes

status

Status	Required
Legal values are:	free

Contained by

header: publicationStmt

May contain

core: p

header: licence

Note

A consistent format should be adopted

Example

<availability status="free"> <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="hr">Ovaj rad je dostupan pod <ref target="http://creativecommons.org/licenses/by/4.0/">međunarodnom licencom Creative Commons Imenovanje 4.0</ref> </p> <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref> </p> </availability>

Content model

<content>
 <elementRef key="licence"/>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element availability { attribute status { "free" }, tei_licence, tei_p+ }⚓

Appendix A.1.7 <bibl>

<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 16.3.2. Declarable Elements]
Module	core — Formal specification
Member of	model.biblLike
Contained by	header: sourceDesc
May contain	core: date publisher title header: edition idno
Note	Contains phrase-level elements, together with any combination of elements from the model.biblPart class
Example	<bibl> <title type="main">Minutes of the National Assembly of the Republic of Bulgaria</title> <date when="2020-03-11">2020-03-11</date> </bibl>
Example	<bibl> <title type="main" xml:lang="en">https://www.tbmm.gov.tr/tutanak/donem24/yil2/bas/b013m.htm</title> <edition xml:lang="en">Official session record</edition> <publisher xml:lang="en">The Turkish Parliament</publisher> <idno type="URI">https://www.tbmm.gov.tr/</idno> <date when="2011-10-27">2011-10-27</date> </bibl>
Content model	<content> <elementRef key="title" minOccurs="1" maxOccurs="unbounded"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="edition" minOccurs="0" maxOccurs="1"/> <elementRef key="publisher" minOccurs="0" maxOccurs="1"/> <elementRef key="idno" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="date" minOccurs="1" maxOccurs="1"/> </alternate> </content> ⚓
Schema Declaration	element bibl { tei_title+, ( tei_edition? \| tei_publisher? \| tei_idno* \| tei_date )+ }⚓

Appendix A.1.8 <birth>

<birth> (birth) contains information about a person's birth, obligatorily its date and optionaly the place. Note that there can be several placeNames, all referring to the same place, but written in different languages or scripts. [16.2.2. The Participant Description]

Module

namesdates — Formal specification

Attributes

when

supplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.

Derived from	att.datable.w3c
Status	Required
Datatype	teidata.temporal.w3c

Contained by

namesdates: person

May contain

namesdates: placeName

Example

Content model

<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="placeName" minOccurs="0"
   maxOccurs="unbounded"/>
 </alternate>
</content>
    ⚓

Schema Declaration

element birth { attribute when { text }, ( tei_placeName* ) }⚓

Appendix A.1.9 <body>

<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure]
Module	textstructure — Formal specification
Contained by	textstructure: text
May contain	textstructure: div
Example	<body> <div type="debateSection">...</div> <div type="debateSection">...</div> ... </body>
Content model	<content> <elementRef key="div" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element body { tei_div+ }⚓

Appendix A.1.10 <catDesc>

<catDesc> (category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal <textDesc>. [2.3.7. The Classification Declaration]
Module	header — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Contained by	header: category
May contain	core: ref term character data
Example	<category xml:id="parla.organisation"> <catDesc xml:lang="en"> <term>Organisation</term> </catDesc> <catDesc xml:lang="bg"> <term>Организация</term> </catDesc> </category>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="term"/> <alternate minOccurs="1" maxOccurs="unbounded"> <textNode/> <elementRef key="ref"/> </alternate> </sequence> </content> ⚓
Schema Declaration	element catDesc { tei_att.global.attribute.xmllang, ( tei_term, ( text \| tei_ref )+ ) }⚓

Appendix A.1.11 <catRef>

<catRef> (category reference) specifies one or more defined categories within some taxonomy or text typology. [2.4.3. The Text Classification]

Module

header — Formal specification

Attributes

target

specifies the destination of the reference by supplying one or more URI References.

Derived from	att.pointing
Status	Required
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

scheme

identifies the classification scheme within which the set of categories concerned is defined, for example by a <taxonomy> element, or by some other resource.

Status	Required
Datatype	teidata.pointer

Contained by

header: textClass

May contain

Empty element

Note

The scheme attribute needs to be supplied only if more than one taxonomy has been declared.

Example

<textClass> <catRef scheme="#parla.legislature" target="#parla.uni"/> </textClass> ... elsewhere ... <taxonomy xml:id="parla.legislature"> ... <category xml:id="parla.uni"> <catDesc xml:lang="lt"> <term>Vienų rūmų parlamentas</term> </catDesc> <catDesc xml:lang="en"> <term>Unicameralism</term> </catDesc> </category> </taxonomy>

Content model

<content>
 <empty/>
</content>
    ⚓

Schema Declaration

element catRef
{
   attribute target { list { + } },
   attribute scheme { text },
   empty
}⚓

Appendix A.1.12 <category>

<category> (category) contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy. [2.3.7. The Classification Declaration]

Module

header — Formal specification

Attributes

att.global
- xml:id
- xml:lang
- xml:base
- xml:space
- @n

xml:id

(identifier) provides a unique identifier for the element bearing the attribute.

Derived from	att.global
Status	Required
Datatype	ID

ana

(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.

Derived from	att.global.analytic
Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Contained by

header: category taxonomy

May contain

header: catDesc category

Example

<category xml:id="parla.session"> <catDesc xml:lang="en"> <term>Session</term>: A parliamentary year, which always begins on the first Tuesday in October at 12.00 o’clock noon and ends on the same date at the same time the following year. However, parliamentary work at Christiansborg is organised in such a way that it primarily takes place from October to June.</catDesc> </category>

Example

<category xml:id="parla.term"> <catDesc xml:lang="nl"> <term>Zittingsperiode</term> </catDesc> <catDesc xml:lang="en"> <term>Legislative period</term> </catDesc> </category>

Content model

<content>
 <elementRef key="catDesc" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="category" minOccurs="0"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element category
{
   tei_att.global.attribute.n,
   attribute xml:id { text },
   attribute ana { list { + } }?,
   tei_catDesc+,
   tei_category*
}⚓

Appendix A.1.13 <change>

<change> (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file. [2.6. The Revision Description 2.4.1. Creation 12.7. Identifying Changes and Revisions]
Module	header — Formal specification
Attributes	att.global xml:id xml:base xml:space @n @xml:lang att.datable.w3c notBefore notAfter from to @when
Contained by	header: revisionDesc
May contain	core: name character data
Note	The who attribute may be used to point to any other element, but will typically specify a <respStmt> or <person> element elsewhere in the header, identifying the person responsible for the change and their role in making it. It is recommended that changes be recorded with the most recent first. The status attribute may be used to indicate the status of a document following the change documented.
Example	<revisionDesc> <change when="2021-01-28"> <name>Tommaso Agnoloni</name>: Generated corpus in ParlaMint.</change> <change when="2021-02-26"> <name>Tommaso Agnoloni</name>, <name>Francesca Frontini</name>: Corpus revision, fixing</change> </revisionDesc>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="name"/> <textNode/> </alternate> </content> ⚓
Schema Declaration	element change { tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.datable.w3c.attribute.when, ( tei_name \| text )+ }⚓

Appendix A.1.14 <classDecl>

<classDecl> (classification declarations) contains taxonomies defining classificatory codes used elsewhere in the text. Note that the taxonomies are in ParlaMint typically stored in separate files. [2.3.7. The Classification Declaration 2.3. The Encoding Description]
Module	header — Formal specification
Contained by	header: encodingDesc
May contain	derived-module-parlamint: include header: taxonomy
Example	<classDecl> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-taxonomy-parla.legislature.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-taxonomy.xml-speaker_types"/> ... </classDecl>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="taxonomy"/> <elementRef key="include"/> </alternate> </content> ⚓
Schema Declaration	element classDecl { ( tei_taxonomy \| tei_include )+ }⚓

Appendix A.1.15 <correction>

<correction> (correction principles) states how and under what circumstances corrections have been made in the text. [2.3.3. The Editorial Practices Declaration 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: editorialDecl
May contain	core: p
Note	May be used to note the results of proof reading the text against its original, indicating (for example) whether discrepancies have been silently rectified, or recorded using the editorial tags described in section 3.5. Simple Editorial Changes.
Example	<editorialDecl> <correction> <p>No correction of source texts was performed.</p> </correction> </editorialDecl>
Content model	<content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element correction { tei_p+ }⚓

Appendix A.1.16 <date>

<date> (date) contains a date in any format. [3.6.4. Dates and Times 2.2.4. Publication, Distribution, Licensing, etc. 2.6. The Revision Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 16.2.3. The Setting Description 14.4. Dates]
Module	core — Formal specification
Attributes	att.typed @type @subtype att.global n xml:base xml:space @xml:id @xml:lang att.global.analytic @ana att.datable.w3c notBefore notAfter @when @from @to
Member of	model.dateLike
Contained by	analysis: s core: bibl date name unit corpus: setting header: publicationStmt
May contain	analysis: pc w core: date character data
Example	The element <date> gives the date in the when attribute in the ISO 8601 format, while the textual content is not constrained: <date when="2021-06-08">2021-06-08</date>
Example	The textual content can be given according to the conventions used in the local language: <date when="2018-04-13" xml:lang="sl">13.4.2018</date>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="date"/> <textNode/> </alternate> </content> ⚓
Schema Declaration	element date { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.typed.attributes, ( tei_w \| tei_pc \| tei_date \| text )+ }⚓

Appendix A.1.17 <death>

<death> (death) contains information about a person's death, obligatorily its date and optionaly the place. Note that there can be several placeNames, all referring to the same place, but written in different languages or scripts. [16.2.2. The Participant Description]

Module

namesdates — Formal specification

Attributes

when

supplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.

Derived from	att.datable.w3c
Status	Required
Datatype	teidata.temporal.w3c

Contained by

namesdates: person

May contain

namesdates: placeName

Example

Content model

<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="placeName" minOccurs="0"
   maxOccurs="unbounded"/>
 </alternate>
</content>
    ⚓

Schema Declaration

element death { attribute when { text }, ( tei_placeName* ) }⚓

Appendix A.1.18 <desc>

<desc> (description) contains a short description of the purpose, function, or use of its parent element, or when the parent is a documentation element, describes or defines the object being documented. [23.4.1. Description of Components]
Module	core — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Member of	model.labelLike
Contained by	core: gap header: application taxonomy namesdates: org spoken: incident kinesic vocal
May contain	core: ref term character data
Note	When used in a specification element such as <elementSpec>, TEI convention requires that this be expressed as a finite clause, begining with an active verb.
Example	<p>Example of <gi>desc</gi> elements for transcriber comments:</p> <gap reason="inaudible"> <desc>speaker spoke too quietly, not understood</desc> </gap> <kinesic type="applause"> <desc xml:lang="sl">ploskanje</desc> </kinesic> <vocal type="interruption"> <desc>sounds from the chamber</desc> </vocal> ... <kinesic type="signal"> <desc>signal for end of debate</desc> </kinesic> ... <incident type="action"> <desc>minute of silence</desc> </incident>
Example	Example of <desc> elements used as a part of taxonomy: <taxonomy xml:id="parla.legislature"> <desc xml:lang="sl"> <term>Zakonodajna oblast</term> </desc> <desc> <term>Legislature</term> </desc> ... </taxonomy>
Example	Element <desc> can also be used to describe tool(s) used to linguistically annotate the corpus: <application version="1.0" ident="reldi-tokeniser"> <label>ReLDI tokeniser</label> <desc xml:lang="en">Tokenisation and sentence segmentation with ReLDI tokeniser, available from <ref target="https://github.com/clarinsi/reldi-tokeniser">https://github.com/clarinsi/reldi-tokeniser</ref>.</desc> </application>
Schematron	A <desc> with a type of deprecationInfo should only occur when its parent element is being deprecated. Furthermore, it should always occur in an element that is being deprecated when <desc> is a valid child of that element. <sch:rule context="tei:desc[ @type eq 'deprecationInfo']"> <sch:assert test="../@validUntil">Information about a deprecation should only be present in a specification element that is being deprecated: that is, only an element that has a @validUntil attribute should have a child <desc type="deprecationInfo">.</sch:assert> </sch:rule>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <elementRef minOccurs="0" key="term"/> <alternate minOccurs="1" maxOccurs="unbounded"> <textNode/> <elementRef key="ref"/> </alternate> </sequence> </content> ⚓
Schema Declaration	element desc { tei_att.global.attribute.xmllang, ( tei_term?, ( text \| tei_ref )+ ) }⚓

Appendix A.1.19 <div>

<div> (text division) contains division of the body a corpus component. [4.1. Divisions of the Body]

Module

textstructure — Formal specification

Attributes

att.global
- xml:base
- xml:space
- @xml:id
- @n
- @xml:lang
att.global.linking
- synch
- next
- prev
- @corresp
att.typed
- type
- @subtype

type

Status

Required

Legal values are:

debateSection: General purpose text division for all parts of parliamentary proceedings. It should include at least one utterance. If needed, the @subtype attribute can be used for additional content classification.
commentSection: A special purpose text division used as a container for transcriber comments. Should not contain any utterances. If needed, the @subtype attribute can be used for additional content classification.

Contained by

textstructure: body

May contain

core: gap head note pb

spoken: incident kinesic u vocal

Example

<div type="debateSection"> <head>Devolution of Power (Cities)</head> <u xml:id="ParlaMint-GB_2015-01-06-commons.u1">...</u> <u xml:id="ParlaMint-GB_2015-01-06-commons.u2">...</u> ... <note>House adjourned.</note> </div>

Schematron

<sch:rule context="tei:l//tei:div"> <sch:assert test="ancestor::tei:floatingText"> Abstract model violation: Metrical lines may not contain higher-level structural elements such as div, unless div is a descendant of floatingText. </sch:assert> </sch:rule>

Schematron

<sch:rule context="tei:div"> <sch:report test="(ancestor::tei:p or ancestor::tei:ab) and not(ancestor::tei:floatingText)"> Abstract model violation: p and ab may not contain higher-level structural elements such as div, unless div is a descendant of floatingText. </sch:report> </sch:rule>

Content model

<content>
 <elementRef key="head" minOccurs="0"
  maxOccurs="unbounded"/>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="note"/>
  <elementRef key="vocal"/>
  <elementRef key="kinesic"/>
  <elementRef key="incident"/>
  <elementRef key="gap"/>
  <elementRef key="pb"/>
  <elementRef key="u"/>
 </alternate>
</content>
    ⚓

Schema Declaration

element div
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.typed.attribute.subtype,
   attribute type { "debateSection" | "commentSection" },
   tei_head*,
   (
      tei_note
    | tei_vocal
    | tei_kinesic
    | tei_incident
    | tei_gap
    | tei_pb
    | tei_u
   )+
}⚓

Appendix A.1.20 <edition>

<edition> (edition) describes the particularities of one edition of a text. [2.2.2. The Edition Statement]
Module	header — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Contained by	core: bibl header: editionStmt
May contain	Character data only
Example	<edition>2.1</edition>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element edition { tei_att.global.attribute.xmllang, text }⚓

Appendix A.1.21 <editionStmt>

<editionStmt> (edition statement) groups information relating to one edition of a text. [2.2.2. The Edition Statement 2.2. The File Description]
Module	header — Formal specification
Contained by	header: fileDesc
May contain	header: edition
Example	<editionStmt> <edition>2.1</edition> </editionStmt>
Content model	<content> <elementRef key="edition" minOccurs="1" maxOccurs="1"/> </content> ⚓
Schema Declaration	element editionStmt { tei_edition }⚓

Appendix A.1.22 <editorialDecl>

<editorialDecl> (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text. [2.3.3. The Editorial Practices Declaration 2.3. The Encoding Description 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: encodingDesc
May contain	header: correction hyphenation normalization quotation segmentation
Example	<editorialDecl> <correction> <p>No correction of source texts was performed.</p> </correction> <normalization> <p>Text has not been normalised, except for spacing.</p> </normalization> <hyphenation> <p>Hyphenation has not been altered with respect to the source files.</p> </hyphenation> <quotation> <p>Quotation marks have been left in the text and are not explicitly marked up.</p> </quotation> <segmentation> <p>The texts are segmented into utterances (contributions) and segments (corresponding to paragraphs in the source transcription).</p> </segmentation> </editorialDecl>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="correction"/> <elementRef key="normalization"/> <elementRef key="hyphenation"/> <elementRef key="quotation"/> <elementRef key="segmentation"/> </alternate> </content> ⚓
Schema Declaration	element editorialDecl { ( tei_correction \| tei_normalization \| tei_hyphenation \| tei_quotation \| tei_segmentation )+ }⚓

Appendix A.1.23 <education>

<education> (education) contains a description of the educational experience of a person. [16.2.2. The Participant Description]
Module	namesdates — Formal specification
Attributes	att.global xml:id xml:base xml:space @n @xml:lang att.datable.w3c notBefore notAfter @when @from @to
Contained by	namesdates: person
May contain	Character data only
Example	<education>Bachelor of Science, Electrical and Information Technology Engineer</education>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element education { tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, text }⚓

Appendix A.1.24 <email>

<email> (electronic mail address) contains an email address identifying a location to which email messages can be delivered. [3.6.2. Addresses]
Module	core — Formal specification
Attributes	att.global n xml:base xml:space @xml:id @xml:lang att.global.analytic @ana
Member of	model.addressLike
Contained by	core: unit
May contain	analysis: pc w character data
Note	The format of a modern Internet email address is defined in RFC 2822
Example	The element can be used for fine-grained Named Entities which include e-mail addresses: <email ana="ne:me" xml:id="ParlaMint-CZ_2014-12-09-ps2013-023-05-003-133.ne87"> <w xml:id="ParlaMint-CZ_2014-12-09-ps2013-023-05-003-133.u4.p9.s3.w13" lemma="namraza@cd.cz" msd="UPosTag=NOUN\|Case=Gen\|Gender=Fem\|Number=Plur\|Polarity=Pos">namraza@cd.cz</w> </email>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <textNode/> </alternate> </content> ⚓
Schema Declaration	element email { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, ( tei_w \| tei_pc \| text )+ }⚓

Appendix A.1.25 <encodingDesc>

<encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. [2.3. The Encoding Description 2.1.1. The TEI Header and Its Components]
Module	header — Formal specification
Contained by	header: teiHeader
May contain	header: appInfo classDecl editorialDecl listPrefixDef projectDesc tagsDecl
Example	General structure of an encoding description: <encodingDesc> <projectDesc>...</projectDesc> <editorialDecl>...</editorialDecl> <tagsDecl>...</tagsDecl> <classDecl>...</classDecl> </encodingDesc>
Example	Structure of an encoding description for unannotated corpus root: <encodingDesc> <projectDesc> <p xml:lang="sl"> <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> </p> <p xml:lang="en"> <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a project that aims to (1) create a multilingual set of comparable corpora of parliamentary proceedings uniformly encoded...</p> </projectDesc> <editorialDecl> <correction>...</correction> <normalization>...</normalization> <hyphenation>...</hyphenation> <quotation>...</quotation> <segmentation>...</segmentation> </editorialDecl> <tagsDecl> <namespace name="http://www.tei-c.org/ns/1.0"> <tagUsage gi="body" occurs="414"/> <tagUsage gi="desc" occurs="10234"/> <tagUsage gi="div" occurs="414"/> </namespace> </tagsDecl> <classDecl>...</classDecl> </encodingDesc>
Example	Example of encoding description of an annotated corpus root. The structure includes two additional elements, <listPrefixDef> and <appInfo>. <encodingDesc> <projectDesc>... </projectDesc> <editorialDecl>...</editorialDecl> <tagsDecl>...</tagsDecl> <classDecl>...</classDecl> <listPrefixDef> <prefixDef ident="mte" matchPattern="(.+)" replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-sl.xml#$1"> <p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Slovenian MULTEXT-East Version 6 MSDs.</p> </prefixDef> </listPrefixDef> <appInfo> <application>...</application> </appInfo> </encodingDesc>
Example	Example of encoding description of a corpus component (annotated or unannotated). In contrast to the corpus root, the encoding description of a corpus component contains only two elements, namely, the <projectDesc> and the <tagsDecl>. <encodingDesc> <projectDesc>...</projectDesc> <tagsDecl>...</tagsDecl> </encodingDesc>
Content model	<content> <elementRef key="projectDesc"/> <elementRef key="editorialDecl" minOccurs="0" maxOccurs="1"/> <elementRef key="tagsDecl"/> <elementRef key="classDecl" minOccurs="0" maxOccurs="1"/> <elementRef key="listPrefixDef" minOccurs="0" maxOccurs="1"/> <elementRef key="appInfo" minOccurs="0" maxOccurs="1"/> </content> ⚓
Schema Declaration	element encodingDesc { tei_projectDesc, tei_editorialDecl?, tei_tagsDecl, tei_classDecl?, tei_listPrefixDef?, tei_appInfo? }⚓

Appendix A.1.26 <equipment>

<equipment> (equipment) provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text. [8.2. Documenting the Source of Transcribed Speech 16.3.2. Declarable Elements]
Module	spoken — Formal specification
Attributes	att.global @xml:id @n @xml:lang @xml:base @xml:space att.global.analytic @ana att.global.linking @corresp @synch @next @prev att.global.rendition @rend @style @rendition att.global.responsibility @resp att.global.source @source att.declarable @default
Contained by	—
May contain	core: p
Example	<equipment> <p>"Hi-8" 8 mm NTSC camcorder with integral directional microphone and windshield and stereo digital sound recording channel. </p> </equipment>
Example	<equipment> <p>8-track analogue transfer mixed down to 19 cm/sec audio tape for cassette mastering</p> </equipment>
Content model	<content> <classRef key="model.pLike" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element equipment { tei_att.global.attributes, tei_att.declarable.attributes, tei_model.pLike+ }⚓

Appendix A.1.27 <equipment>

<equipment> (equipment) provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text.
Module	spoken — Formal specification
Attributes	att.global @xml:id @n @xml:lang @xml:base @xml:space att.global.analytic @ana att.global.linking @corresp @synch @next @prev att.global.rendition @rend @style @rendition att.global.responsibility @resp att.global.source @source att.declarable @default
Contained by	—
May contain	core: p
Example	<equipment> <p>"Hi-8" 8 mm NTSC camcorder with integral directional microphone and windshield and stereo digital sound recording channel. </p> </equipment>
Content model	<content> <classRef key="model.pLike" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element equipment { tei_att.global.attributes, tei_att.declarable.attributes, tei_model.pLike+ }⚓

Appendix A.1.28 <event>

<event> (event) contains data relating to any kind of significant event associated with a person, place, or organisation. [14.3.1. Basic Principles]
Module	namesdates — Formal specification
Attributes	att.global n xml:lang xml:base xml:space @xml:id att.datable.w3c notBefore notAfter @when @from @to
Contained by	namesdates: listEvent org
May contain	core: label
Example	<event xml:id="PoGB.55" from="2010-05-18" to="2015-03-30"> <label>Fifty-fifth Parliament of the United Kingdom</label> </event>
Example	<org xml:id="government.HR" role="government"> <orgName xml:lang="hr" full="yes">Vlada Republike Hrvatske</orgName> <orgName xml:lang="en" full="yes">Government of the Republic of Croatia</orgName> <event from="1990-05-30"> <label xml:lang="en">existence</label> </event> </org>
Content model	<content> <elementRef key="label" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element event { tei_att.global.attribute.xmlid, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_label+ }⚓

Appendix A.1.29 <extent>

<extent> (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units. [2.2.3. Type and Extent of File 2.2. The File Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 11.7.1. Object Description]
Module	header — Formal specification
Contained by	header: fileDesc
May contain	core: measure
Example	<extent> <measure unit="speeches" quantity="75122" xml:lang="sl">75.122 govorov</measure> <measure unit="speeches" quantity="75122" xml:lang="en">75,122 speeches</measure> <measure unit="words" quantity="20190034" xml:lang="sl">20.190.034 besed</measure> <measure unit="words" quantity="20190034" xml:lang="en">20,190,034 words</measure> </extent>
Content model	<content> <elementRef key="measure" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element extent { tei_measure+ }⚓

Appendix A.1.30 <figure>

<figure> (figure) groups elements representing or containing graphic information such as an illustration, formula, or figure. [15.4. Specific Elements for Graphic Images]
Module	figures — Formal specification
Member of	model.global
Contained by	namesdates: person
May contain	core: graphic head
Example	<figure> <graphic url="https://www.psp.cz/eknih/cdrom/2017ps/eknih/2017ps/poslanci/i6497.jpg"/> </figure>
Content model	<content> <elementRef key="head" minOccurs="0" maxOccurs="1"/> <elementRef key="graphic" minOccurs="1" maxOccurs="1"/> </content> ⚓
Schema Declaration	element figure { tei_head?, tei_graphic }⚓

Appendix A.1.31 <fileDesc>

<fileDesc> (file description) contains a full bibliographic description of an electronic file. [2.2. The File Description 2.1.1. The TEI Header and Its Components]
Module	header — Formal specification
Contained by	header: teiHeader
May contain	header: editionStmt extent publicationStmt sourceDesc titleStmt
Note	The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived.
Example	Basic structure of the <fileDesc> element: <fileDesc> <titleStmt>...</titleStmt> <editionStmt>...</editionStmt> <extent>...</extent> <publicationStmt>...</publicationStmt> <sourceDesc>...</sourceDesc> </fileDesc>
Example	Example of the <fileDesc> element in a corpus root: <fileDesc> <titleStmt> <title type="main" xml:lang="en">Dutch parliamentary corpus ParlaMint-NL [ParlaMint]</title> <title type="main" xml:lang="nl">Corpus van het Nederlandse Parlement ParlaMint-NL [ParlaMint]</title> <title type="sub" xml:lang="en">Minutes of the Eerste Kamer and Tweede Kamer of The Netherlands (2015-2020)</title> <title type="sub" xml:lang="nl">Minuten van de Eerste en Tweede Kamer van Nederland (2015-2020)</title> <meeting n="28-lower" ana="#parla.lower #parla.term">28ste Tweede Kamer</meeting> <meeting n="29-lower" ana="#parla.lower #parla.term">29ste Tweede Kamer</meeting> <meeting n="34-upper" ana="#parla.upper #parla.term">34ste Eerste Kamer</meeting> <meeting n="35-upper" ana="#parla.upper #parla.term">35ste Eerste Kamer</meeting> <meeting n="36-upper" ana="#parla.upper #parla.term">36ste Eerste Kamer</meeting> <respStmt> <persName xml:id="RubenvanHeusden" xml:lang="nl">Ruben van Heusden</persName> <resp xml:lang="en">Downloading and converting the corpus to TEI format</resp> </respStmt> <funder> <orgName xml:lang="en">The CLARIN research infrastructure</orgName> </funder> </titleStmt> <editionStmt> <edition>2.1</edition> </editionStmt> <extent> <measure unit="speeches" xml:lang="nl" quantity="474964">474,964 toespraken</measure> <measure unit="speeches" xml:lang="en" quantity="474964">474,964 speeches</measure> <measure unit="words" xml:lang="nl" quantity="51451191">51,451,191 woorden</measure> <measure unit="words" xml:lang="en" quantity="51451191">51,451,191 words</measure> </extent> <publicationStmt> <publisher> <orgName xml:lang="en">CLARIN research infrastructure</orgName> <ref target="https://www.clarin.eu/">www.clarin.eu</ref> </publisher> <idno subtype="handle" type="URI">http://hdl.handle.net/11356/1432</idno> <availability status="free"> <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="en">This work is licensed under the<ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref> </p> </availability> <date when="2021-06-10">June 10, 2021</date> </publicationStmt> <sourceDesc> <bibl> <title type="main">Minutes of the Eerste Kamer of The Netherlands</title> <idno type="URI">https://www.eerstekamer.nl/</idno> <date from="2014-12-15" to="2020-11-03">2014-12-15 - 2020-11-03</date> </bibl> <bibl> <title type="main">Minutes of the Tweede Kamer of The Netherlands</title> <idno type="URI">https://www.tweedekamer.nl/</idno> <date from="2014-04-16" to="2020-10-14">2014-04-16 - 2020-10-14</date> </bibl> </sourceDesc> </fileDesc>
Example	Example of the <fileDesc> element in a corpus component: <fileDesc> <titleStmt> <title type="main" xml:lang="en">Dutch parliamentary corpus ParlaMint-NL, Lower House 2014-04-16 [ParlaMint]</title> <title type="main" xml:lang="nl">Corpus van het Nederlandse parlement ParlaMint-NL, Tweede Kamer 2014-04-16 [ParlaMint]</title> <title type="sub" xml:lang="en">Report of the meeting of the Dutch Lower House, Meeting 76, Session 2 (2014-04-16)</title> <title type="sub" xml:lang="nl">Verslag van de vergadering van de Tweede Kamer, Meeting 76, Session 2 (2014-04-16)</title> <meeting ana="#parla.lower #parla.meeting.regular" corresp="#TK" n="76">Meeting 76</meeting> <meeting ana="#parla.lower #parla.session" corresp="#TK" n="2">Session 2</meeting> <meeting ana="#parla.lower #parla.term #TK.28" corresp="#TK" n="28-lower">Meeting of the 28th Tweede Kamer</meeting> <respStmt> <persName xml:id="RubenvanHeusden" xml:lang="nl">Ruben van Heusden</persName> <resp xml:lang="en">Downloading and converting the corpus to TEI format</resp> </respStmt> <funder> <orgName xml:lang="en">The CLARIN research infrastructure</orgName> </funder> </titleStmt> <editionStmt> <edition>2.1</edition> </editionStmt> <extent> <measure unit="speeches" xml:lang="nl" quantity="18">18 toespraken</measure> <measure unit="speeches" xml:lang="en" quantity="18">18 speeches</measure> <measure unit="words" xml:lang="nl" quantity="1094">1,094 woorden</measure> <measure unit="words" xml:lang="en" quantity="1094">1,094 words</measure> </extent> <publicationStmt> <publisher> <orgName xml:lang="en">CLARIN research infrastructure</orgName> <ref target="https://www.clarin.eu/">www.clarin.eu</ref> </publisher> <idno subtype="handle" type="URI">http://hdl.handle.net/11356/1432</idno> <availability status="free"> <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="en">This work is licensed under the<ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p> </availability> <date when="2021-06-10">June 10, 2021</date> </publicationStmt> <sourceDesc> <bibl> <title type="main">Minutes of the Tweede Kamer of The Netherlands</title> <idno type="URI">https://www.tweedekamer.nl/</idno> <date when="2014-04-16">2014-04-16</date> </bibl> </sourceDesc> </fileDesc>
Content model	<content> <elementRef key="titleStmt"/> <elementRef key="editionStmt"/> <elementRef key="extent"/> <elementRef key="publicationStmt"/> <elementRef key="sourceDesc"/> </content> ⚓
Schema Declaration	element fileDesc { tei_titleStmt, tei_editionStmt, tei_extent, tei_publicationStmt, tei_sourceDesc }⚓

Appendix A.1.32 <forename>

<forename> (forename) contains a forename, given or baptismal name. [14.2.1. Personal Names]
Module	namesdates — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Member of	model.persNamePart
Contained by	namesdates: persName
May contain	Character data only
Example	<persName> <surname>Bongiorno</surname> <forename>Giulia</forename> </persName>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element forename { tei_att.global.attribute.xmllang, text }⚓

Appendix A.1.33 <funder>

<funder> (funding body) specifies the name of an individual, institution, or organisation responsible for the funding of a project or text. [2.2.1. The Title Statement]
Module	header — Formal specification
Contained by	header: titleStmt
May contain	core: ref namesdates: orgName
Note	Funders provide financial support for a project; they are distinct from sponsors (see element <sponsor>), who provide intellectual support and authority.
Example	<funder> <orgName xml:lang="es">CLARIN infraestructura de investigación científica</orgName> <orgName xml:lang="en">The CLARIN research infrastructure</orgName> </funder>
Content model	<content> <elementRef key="orgName" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="ref" minOccurs="0" maxOccurs="1"/> </content> ⚓
Schema Declaration	element funder { tei_orgName+, tei_ref? }⚓

Appendix A.1.34 <gap>

<gap> (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. [3.5.3. Additions, Deletions, and Omissions]

Module

core — Formal specification

Attributes

att.global
- xml:base
- xml:space
- @xml:id
- @n
- @xml:lang
att.global.linking
- synch
- next
- prev
- @corresp

reason

Status	Recommended
Legal values are:	inaudible editorial foreign

Member of

model.global.edit

Contained by

analysis: s

core: name unit

linking: seg

spoken: u

textstructure: div

May contain

core: desc

Note

The <gap>, <unclear>, and <del> core tag elements may be closely allied in use with the <damage> and <supplied> elements, available when using the additional tagset for transcription of primary sources. See section 12.3.3.2. Use of the gap, del, damage, unclear, and supplied Elements in Combination for discussion of which element is appropriate for which circumstance.

The <gap> tag simply signals the editors decision to omit or inability to transcribe a span of text. Other information, such as the interpretation that text was deliberately erased or covered, should be indicated using the relevant tags, such as <del> in the case of deliberate deletion.

Example

<gap reason="inaudible"> <desc>microphone muted</desc> </gap>

Example

<gap reason="editorial"> <desc xml:lang="de">Zitierte Druckfassung entfernt</desc> <desc xml:lang="en">Quoted printed matter omited</desc> </gap>

Example

<gap reason="foreign"> <desc xml:lang="und">Huliniahuanngittunga</desc> </gap>

Content model

<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element gap
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   attribute reason { "inaudible" | "editorial" | "foreign" }?,
   tei_desc+
}⚓

Appendix A.1.35 <graphic>

<graphic> (graphic) indicates the location of a graphic or illustration, either forming part of a text, or providing an image of it. [3.10. Graphics and Other Non-textual Components 12.1. Digital Facsimiles]
Module	core — Formal specification
Attributes	att.resourced @url att.media width height @scale
Member of	model.graphicLike
Contained by	figures: figure
May contain	Empty element
Note	The mimeType attribute should be used to supply the MIME media type of the image specified by the url attribute. Within the body of a text, a <graphic> element indicates the presence of a graphic component in the source itself. Within the context of a <facsimile> or <sourceDoc> element, however, a <graphic> element provides an additional digital representation of some part of the source being encoded.
Example	<figure> <graphic url="https://www.dekamer.be//site/wwwroot/images/cv/06595.gif"/> </figure>
Content model	<content> <empty/> </content> ⚓
Schema Declaration	element graphic { tei_att.media.attribute.scale, tei_att.resourced.attributes, empty }⚓

Appendix A.1.36 <head>

<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers]
Module	core — Formal specification
Attributes	att.global n xml:base xml:space @xml:id @xml:lang att.global.linking synch next prev @corresp att.typed subtype @type
Contained by	figures: figure namesdates: listEvent listOrg listPerson org textstructure: div
May contain	Character data only
Note	The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section.
Example	The most common use for the <head> element is to mark the headings of sections: <div type="debateSection"> <head>Regulation of Health and Social Care Professions Etc. Bill [HL]</head> ... </div>
Example	The <head> element may also be used to give the title to specialised lists: <listEvent> <head xml:lang="nl">Zittingsperiode</head> <head xml:lang="en">Legislative period</head> <event to="2007-05-02" from="2003-06-05" xml:id="period_51"> <label xml:lang="nl">Zittingsperiode 51</label> <label xml:lang="en">Legislative period 51</label> </event> ... </listEvent>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element head { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_att.typed.attribute.type, text }⚓

Appendix A.1.37 <hyphenation>

<hyphenation> (hyphenation) summarizes the way in which hyphenation in a source text has been treated in an encoded version of it. [2.3.3. The Editorial Practices Declaration 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: editorialDecl
May contain	core: p
Example	<editorialDecl> ... <hyphenation> <p xml:lang="en">No end-of-line hyphens were present in the source.</p> </hyphenation> ... </editorialDecl>
Content model	<content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element hyphenation { tei_p+ }⚓

Appendix A.1.38 <idno>

<idno> (identifier) supplies an identifier used to identify some object, such as a person or organisation. If it is a URL, it should have @type="URI". [14.3.1. Basic Principles 2.2.4. Publication, Distribution, Licensing, etc. 2.2.5. The Series Statement 3.12.2.4. Imprint, Size of a Document, and Reprint Information]

Module header — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang

type

categorizes the identifier.

Status	Required
Legal values are:	URI Uniform Resource Identifier ParlaMint should be a resolvable URL, with the subtype classifying the type of web site. VIAF The URL of the Virtual Internet Authority File assigned to link different names in catalogs around the world for the same entity.

subtype

Status	Optional
Legal values are:	handle The permanent identifier of type handle. government A governmental web site. politicalParty The web site of a political party. parliament A web site of the parliament. ministry The web site of a ministry. personal The personal web site of a person. business A web site belonging to a bussiness. publicService The web site of a pubic service. wikimedia A web site of Wikimedia, e.g. Wikipedia. facebook A Facebook web site. twitter A Twitter web site. tiktok A TikTok web site. instagram An Instagram web site.
Note	this attribute should always be used with type="URI"

Member of

model.nameLike

Contained by

core: bibl

header: publicationStmt

namesdates: org person

May contain Character data only

Note

<idno> should be used for labels which identify an object or concept in a formal cataloguing system such as a database or an RDF store, or in a distributed system such as the World Wide Web. Some suggested values for type on <idno> are ISBN, ISSN, DOI, and URI.

Example

<publicationStmt> ... <idno type="URI" subtype="handle">http://hdl.handle.net/11356/1432</idno> ... </publicationStmt>

Example

<sourceDesc> <bibl> <title type="main" xml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title> ... <idno type="URI">https://www.dz-rs.si</idno> ... </bibl> </sourceDesc>

Example

<idno type="URI" subtype="wikimedia" xml:lang="sl">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno> <idno type="URI" subtype="wikimedia" xml:lang="en">https://en.wikipedia.org/wiki/Positive_Slovenia</idno>

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element idno
{
   tei_att.global.attribute.xmllang,
   attribute type { "URI" | "VIAF" },
   attribute subtype
   {
      "handle"
    | "government"
    | "politicalParty"
    | "parliament"
    | "ministry"
    | "personal"
    | "business"
    | "publicService"
    | "wikimedia"
    | "facebook"
    | "twitter"
    | "tiktok"
    | "instagram"
   }?,
   text
}⚓

Appendix A.1.39 <incident>

<incident> (incident) marks any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication. [8.3.3. Vocal, Kinesic, Incident]

Module

spoken — Formal specification

Attributes

att.ascribed
- @who
att.global
- xml:base
- xml:space
- @xml:id
- @n
- @xml:lang
att.global.linking
- synch
- next
- prev
- @corresp
att.typed
- type
- @subtype

type

Status	Recommended
Legal values are:	action incident leaving entering break pause sound editorial

Member of

model.global.spoken

Contained by

analysis: s

core: name unit

linking: seg

spoken: u

textstructure: div

May contain

core: desc

Example

<incident type="action"> <desc>He stands and with him the whole Assembly</desc> </incident>

Example

<incident type="sound"> <desc>The Assembly observed a minute of silence. Applause.</desc> </incident>

Example

<incident type="entering"> <desc>Arrival of the President of the Republic of Poland</desc> </incident>

Content model

<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element incident
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.ascribed.attributes,
   tei_att.typed.attribute.subtype,
   attribute type
   {
      "action"
    | "incident"
    | "leaving"
    | "entering"
    | "break"
    | "pause"
    | "sound"
    | "editorial"
   }?,
   tei_desc+
}⚓

Appendix A.1.40 <include>

<include> is an element from the XML namespace of the XML Inclusions (XInclude) W3C recommendation. It is used to include, into a ParlaMint <teiCorpus> root file the elements of the corpus that are stored as separate files. These are the <TEI> corpus components and parts of the corpus root <teiHeader>. Inside <particDesc> these are <listPerson> & <listOrg>, and <taxonomy> inside <classDecl>.

Namespace

http://www.w3.org/2001/XInclude

Module

derived-module-parlamint

Attributes

href

Status	Optional
Datatype	teidata.pointer

Contained by

core: teiCorpus

corpus: particDesc

header: classDecl

May contain

Empty element

Example

Using XInclude in ParlaMint to include corpus components into the corpus root:

<teiCorpus xml:lang="en" xml:id="ParlaMint-GB" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> ...TEI header of the corpus... </teiHeader> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2015/ParlaMint-GB_2015-01-05-commons.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2015/ParlaMint-GB_2015-01-06-commons.xml"/> ... </teiCorpus>

Appendix A.1.41 <kinesic>

<kinesic> (kinesic) marks any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc. [8.3.3. Vocal, Kinesic, Incident]

Module

spoken — Formal specification

Attributes

att.global
- xml:base
- xml:space
- @xml:id
- @n
- @xml:lang
att.global.linking
- synch
- next
- prev
- @corresp
att.ascribed
- @who
att.typed
- type
- @subtype

type

Status	Recommended
Legal values are:	kinesic applause ringing signal playback gesture smiling laughter snapping noise

Member of

model.global.spoken

Contained by

analysis: s

core: name unit

linking: seg

spoken: u

textstructure: div

May contain

core: desc

Example

<kinesic type="signal"> <desc>sign for the end of discussion</desc> </kinesic>

Example

<kinesic type="laughter"> <desc xml:lang="hr">smijeh.</desc> </kinesic>

Example

<kinesic type="applause"> <desc xml:lang="sl">ploskanje</desc> </kinesic>

Content model

<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element kinesic
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.ascribed.attribute.who,
   tei_att.typed.attribute.subtype,
   attribute type
   {
      "kinesic"
    | "applause"
    | "ringing"
    | "signal"
    | "playback"
    | "gesture"
    | "smiling"
    | "laughter"
    | "snapping"
    | "noise"
   }?,
   tei_desc+
}⚓

Appendix A.1.42 <label>

<label> (label) contains any label or heading used to identify part of a text, typically but not exclusively in a list or glossary. [3.8. Lists]
Module	core — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Member of	model.labelLike
Contained by	header: application namesdates: event
May contain	namesdates: orgName character data
Example	Labels denote the existence of organisations and connected events: <org xml:id="DZ" role="parliament" ana="#parla.national #parla.lower"> <orgName xml:lang="sl" full="yes">Državni zbor Republike Slovenije</orgName> <orgName xml:lang="en" full="yes">National Assembly of the Republic of Slovenia</orgName> <event from="1992-12-23"> <label xml:lang="en">existence</label> </event> ... <listEvent> <head xml:lang="sl">Mandatno obdobje</head> <head xml:lang="en">Legislative period</head> <event xml:id="DZ.7" from="2014-08-01" to="2018-06-21"> <label xml:lang="sl">7. mandat</label> <label xml:lang="en">Term 7</label> </event> <event xml:id="DZ.8" from="2018-06-22"> <label xml:lang="sl">8. mandat</label> <label xml:lang="en">Term 8</label> </event> </listEvent> </org>
Example	Labels may also be used to give a name to the tools used in compiling the corpus: <application ident="int-tagger" version="1.0"> <label>INT Tagger, lemmatizer and Tokenizer</label> <desc xml:lang="en">INT Tagger, lemmatizer and Tokenizer for modern Dutch, based on old-school machine learning (SVM). It provides the legacy PoS tags (encoded in w/@ana) and the lemmata for Dutch. Not publicly available.</desc> </application>
Example	Labels may also be used for other structured list items: <listEvent> <head xml:lang="lv">Saeimas sasaukumi</head> <head xml:lang="en">Legislative period</head> <event xml:id="PT.12" from="2014-11-04" to="2018-11-05"> <label xml:lang="lv">12. Saeima</label> <label xml:lang="en">Term 12</label> </event> <event xml:id="PT.13" from="2018-11-06"> <label xml:lang="lv">13. Saeima</label> <label xml:lang="en">Term 13</label> </event> </listEvent>
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <textNode/> <elementRef key="orgName"/> </alternate> </content> ⚓
Schema Declaration	element label { tei_att.global.attribute.xmllang, ( text \| tei_orgName ) }⚓

Appendix A.1.43 <langUsage>

<langUsage> (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text. [2.4.2. Language Usage 2.4. The Profile Description 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: profileDesc
May contain	header: language
Example	<langUsage> <language ident="sl" xml:lang="sl">slovenski</language> <language ident="en" xml:lang="sl">angleški</language> <language ident="sl" xml:lang="en">Slovenian</language> <language ident="en" xml:lang="en">English</language> </langUsage>
Content model	<content> <elementRef key="language" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element langUsage { tei_language+ }⚓

Appendix A.1.44 <language>

<language> (language) characterizes a single language or sublanguage used within a text. [2.4.2. Language Usage]

Module

header — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang

ident

(identifier) Supplies a language code constructed as defined in BCP 47 which is used to identify the language documented by this element, and which may be referenced by the global xml:lang attribute.

Status	Required
Datatype	teidata.language

usage

specifies the approximate percentage of the text which uses this language.

Status	Optional
Datatype	nonNegativeInteger

Contained by

header: langUsage

May contain

Character data only

Note

Particularly for sublanguages, an informal prose characterization should be supplied as content for the element.

Example

<langUsage> <language ident="es" xml:lang="es">Español</language> <language ident="es" xml:lang="en">Spanish</language> </langUsage>

Example

<langUsage> <language ident="bg-Latn" xml:lang="en">Bulgarian in Latin script</language> <language ident="bg" xml:lang="bg">български</language> <language ident="bg" xml:lang="en">Bulgarian</language> <language ident="en" xml:lang="bg">английски</language> <language ident="en" xml:lang="en">English</language> <language ident="fr" xml:lang="bg">френски</language> <language ident="fr" xml:lang="en">French</language> </langUsage>

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element language
{
   tei_att.global.attribute.xmllang,
   attribute ident { text },
   attribute usage { text }?,
   text
}⚓

Appendix A.1.45 <licence>

<licence> contains information about a licence or other legal agreement applicable to the text. [2.2.4. Publication, Distribution, Licensing, etc.]
Module	header — Formal specification
Contained by	header: availability
May contain	XSD anyURI
Note	A <licence> element should be supplied for each licence agreement applicable to the text in question. The target attribute may be used to reference a full version of the licence. The when, notBefore, notAfter, from or to attributes may be used in combination to indicate the date or dates of applicability of the licence.
Example	The <licence> specifies fixed-value CC BY 4.0 URL, and in the following paragraph gives a prose description of the licence: <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p>This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref> </p>
Example	The textual information on licence can be given in more than one language: <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="hr">Ovaj rad je dostupan pod <ref target="http://creativecommons.org/licenses/by/4.0/">međunarodnom licencom Creative Commons Imenovanje 4.0</ref> </p> <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref> </p>
Content model	<content> <dataRef name="anyURI"/> </content> ⚓
Schema Declaration	element licence { xsd:anyURI }⚓

Appendix A.1.46 <link>

Module

linking — Formal specification

Attributes

ana

Status	Required
Datatype	teidata.pointer

target

Status	Required
Datatype	1–2 occurrences of teidata.pointer separated by whitespace

Member of

model.global.meta

Contained by

linking: linkGrp

May contain

Empty element

Note

This element should only be used to encode associations not otherwise provided for by more specific elements.

The location of this element within a document has no significance, unless it is included within a <linkGrp>, in which case it may inherit the value of the type attribute from the value given on the <linkGrp>.

Example

Element <link>, given in <linkGrp> joins two tokens according to their syntactic dependency. The example below illustrating this is given, for readability, without the word-level linguistic attributes and with shortened IDs:

Schematron

<sch:rule context="tei:link"> <sch:assert test="contains(normalize-space(@target),' ')">You must supply at least two values for @target or on <sch:name/> </sch:assert> </sch:rule>

Content model

<content>
 <empty/>
</content>
    ⚓

Schema Declaration

element link { attribute ana { text }, attribute target { list { ? } }, empty }⚓

Appendix A.1.47 <linkGrp>

<linkGrp> (link group) defines a collection of associations or hypertextual links. [17.1. Links]

Module

linking — Formal specification

Attributes

targFunc

Status	Required
Legal values are:	head argument

type

Status	Required
Legal values are:	UD-SYN

Member of

model.global.meta

Contained by

analysis: s

May contain

linking: link

Note

May contain one or more <link> or <ptr> elements.

A web or link group is an administrative convenience, which should be used to collect a set of links together for any purpose, not simply to supply a default value for the type attribute.

Example

Syntactic analysis is stored in the link group, <linkGrp> element, which is then composed of <link> elements. The example below illustrating this is given, for readability, without the word-level linguistic attributes and with shortened IDs:

Content model

<content>
 <elementRef maxOccurs="unbounded"
  key="link"/>
</content>
    ⚓

Schema Declaration

element linkGrp
{
   attribute targFunc { "head argument" },
   attribute type { "UD-SYN" },
   tei_link+
}⚓

Appendix A.1.48 <listEvent>

<listEvent> (list of events) contains a list of descriptions, each of which provides information about an identifiable event. [14.3.1. Basic Principles]
Module	namesdates — Formal specification
Member of	model.listLike
Contained by	namesdates: org
May contain	core: head namesdates: event
Example	<listEvent> <event xml:id="GOV.11" from="2013-03-20" to="2014-09-18"> <label xml:lang="sl">11. vlada Republike Slovenije (20. marec 2013 - 18. september 2014)</label> <label xml:lang="en">11th Government of the Republic of Slovenia (20 March 2013 - 18 September 2014)</label> </event> ... <event xml:id="GOV.14" from="2018-03-13"> <label xml:lang="sl">14. vlada Republike Slovenije (13. marec 2020 - danes)</label> <label xml:lang="en">14th Government of the Republic of Slovenia (March 13, 2020 - today)</label> </event> </listEvent>
Example	<org ana="#parla.national #parla.upper" role="parliament" xml:id="LEG"> <orgName full="yes" xml:lang="it">Senato della Repubblica Italiana</orgName> <orgName full="yes" xml:lang="it">Senate of the Republic of Italy</orgName> ... <listEvent> <event from="2013-03-15" to="2018-03-22" xml:id="LEG.17"> <label xml:lang="it">XVII Legislatura</label> <label xml:lang="en">XVII Legislative Term</label> </event> <event from="2018-03-23" xml:id="LEG.18"> <label xml:lang="it">XVIII Legislatura</label> <label xml:lang="en">XVIII Legislative Term</label> </event> </listEvent> </org>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="event" minOccurs="0" maxOccurs="unbounded"/> </sequence> </content> ⚓
Schema Declaration	element listEvent { tei_head, tei_event }⚓

Appendix A.1.49 <listOrg>

<listOrg> (list of organizations) contains a list of elements, each of which provides information about an identifiable organisation. [14.2.2. Organizational Names]
Module	namesdates — Formal specification
Attributes	att.global n xml:base xml:space @xml:id @xml:lang
Member of	model.listLike
Contained by	corpus: particDesc
May contain	core: head namesdates: listRelation org
Note	The type attribute may be used to distinguish lists of organizations of a particular type if convenient.
Example	<listOrg> <org xml:id="government.GB" role="government"> ... </org> <org xml:id="PoGB" role="parliament"> ... </org> <org role="parliamentaryGroup" xml:id="party.LI"> ... </org> ... <listRelation> ... </listRelation> </listOrg>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="org" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="listRelation" minOccurs="0" maxOccurs="1"/> </sequence> </content> ⚓
Schema Declaration	element listOrg { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, ( tei_head*, tei_org+, tei_listRelation? ) }⚓

Appendix A.1.50 <listPerson>

<listPerson> (list of persons) contains a list of descriptions, each of which provides information about an identifiable person or a group of people, for example the participants in a language interaction, or the people referred to in a historical source. [14.3.2. The Person Element 16.2. Contextual Information 2.4. The Profile Description 16.3.2. Declarable Elements]
Module	namesdates — Formal specification
Attributes	att.global n xml:base xml:space @xml:id @xml:lang
Member of	model.listLike
Contained by	corpus: particDesc
May contain	core: head namesdates: person
Note	The type attribute may be used to distinguish lists of people of a particular type if convenient.
Example	<listPerson> <head>List of speakers</head> <person xml:id="SayeedaWarsi"> ... </person> <person xml:id="DavidHamilton"> ... </person> ... </listPerson>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="head" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="person" minOccurs="1" maxOccurs="unbounded"/> </sequence> </content> ⚓
Schema Declaration	element listPerson { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, ( tei_head*, tei_person+ ) }⚓

Appendix A.1.51 <listPrefixDef>

<listPrefixDef> (list of prefix definitions) contains a list of definitions of prefixing schemes used in teidata.pointer values, showing how abbreviated URIs using each scheme may be expanded into full URIs. [17.2.3. Using Abbreviated Pointers]
Module	header — Formal specification
Contained by	header: encodingDesc
May contain	header: prefixDef
Example	In this example, two private URI scheme prefixes are defined and patterns are provided for dereferencing them. Each prefix is also supplied with a human-readable explanation in a <p> element. <listPrefixDef> <prefixDef ident="ud-syn" matchPattern="(.+)" replacementPattern="#$1"> <p>Private URIs with this prefix point to elements giving their name. In this document they are simply local references into the UD-SYN taxonomy categories in the corpus root TEI header.</p> </prefixDef> <prefixDef ident="ne" matchPattern="(.+)" replacementPattern="#NER.cnec2.0.$1"> <p>Taxonomy for named entities (cnec2.0)</p> </prefixDef> </listPrefixDef>
Content model	<content> <elementRef key="prefixDef" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element listPrefixDef { tei_prefixDef+ }⚓

Appendix A.1.52 <listRelation>

<listRelation> provides information about relationships identified amongst people, places, and organisations, either informally as prose or as formally expressed relation links. [14.3.2.3. Personal Relationships]
Module	namesdates — Formal specification
Member of	model.listLike
Contained by	namesdates: listOrg
May contain	namesdates: relation
Note	May contain a prose description organized as paragraphs, or a sequence of <relation> elements.
Example	<listOrg> <org role="parliamentaryGroup" xml:id="party.LD"> <orgName full="yes">Liberal Democrat</orgName> <orgName full="abb">LD</orgName> </org> <org role="parliamentaryGroup" xml:id="party.I"> <orgName full="yes">Independent</orgName> <orgName full="abb">I</orgName> </org> <org role="parliamentaryGroup" xml:id="party.0UBS"> <orgName full="yes">Independent Conservative</orgName> <orgName full="abb">0UBS</orgName> </org> <org>... </org> <listRelation> <relation name="coalition" mutual="#party.CON #party.LD" from="2010-05-06" to="2015-05-07"/> <relation name="opposition" active="#party.LAB #party.SO0T #party.64RT #party.SDLP #party.L1QU #party.0UBS #party.BI #party.LI #party.LB #party.LJ95 #party.IGC #party.NPBE #party.CB #party.QMZZ #party.IL #party.UUP #party.FZPG #party.A #party.GP #party.SNP #party.I #party.L8TA #party.CON #party.NA #party.DUP #party.UUSL #party.ZKPW #party.UKIP #party.PC" passive="#government.GB" from="2010-05-06" to="2015-05-07"/> <relation>...</relation> ... </listRelation> </listOrg>
Content model	<content> <elementRef key="relation" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element listRelation { tei_relation+ }⚓

Appendix A.1.53 <measure>

<measure> (measure) either gives (in teiHeader//extent) the number of occurences of certain items (typicaly elements) in the corpus or corpus component or the score of an annotation. ParlaMint currently uses it for giving the sentiment score of utterrances and sentences. In this case measure should have empty content. [3.6.3. Numbers and Measures]

Module

core — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang
att.global.analytic
- @ana
att.global.linking
- synch
- next
- prev
- @corresp

unit

Status	Optional
Legal values are:	speeches words tokens optional value

quantity

(quantity) specifies the number of the specified units that comprise the measurement

Derived from	att.measurement
Status	Required
Datatype	teidata.numeric

type

specifies the type of measurement in any convenient typology.

Derived from	att.typed
Status	Optional
Datatype	teidata.enumerated

Member of

model.measureLike

Contained by

analysis: s

header: extent

linking: seg

spoken: u

May contain

Character data only

Example

<measure unit="speeches" quantity="75122" xml:lang="sl">75.122 govorov</measure> <measure unit="speeches" quantity="75122" xml:lang="en">75,122 speeches</measure> <measure unit="words" quantity="20190034" xml:lang="sl">20.190.034 besed</measure> <measure unit="words" quantity="20190034" xml:lang="en">20,190,034 words</measure>

Example

Sentiment score of a sentence:

<s xml:id="ParlaMint-SI_2000-10-27-SDZ3-Redna-01.ana.seg1.2"> <measure type="sentiment" quantity="4.1" ana="senti:mixpos" corresp="#ParlaMint-SI_2000-10-27-SDZ3-Redna-01.ana.seg1.2"/> <w xml:id="ParlaMint-SI_2000-10-27-SDZ3-Redna-01.ana.seg1.2.1" msd="UPosTag=ADJ|Case=Nom|Degree=Pos|Gender=Fem|Number=Plur|VerbForm=Part" ana="mte:Appfpn" lemma="spoštovan">Spoštovane</w> ... </s>

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element measure
{
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.global.linking.attribute.corresp,
   attribute unit { "speeches" | "words" | "tokens" }?,
   attribute quantity { text },
   attribute type { text }?,
   text
}⚓

Appendix A.1.54 <media>

<media> indicates the location of any form of external media such as an audio or video clip etc. [3.10. Graphics and Other Non-textual Components]

Module

core — Formal specification

Attributes

att.resourced
- @url
att.global
- n
- xml:lang
- xml:base
- xml:space
- @xml:id
att.global.source
- @source

mimeType

(MIME media type) specifies the applicable multimedia internet mail extension (MIME) media type.

Derived from	att.internetMedia
Status	Required
Datatype	1–∞ occurrences of teidata.word separated by whitespace

Member of

model.graphicLike

Contained by

spoken: recording

May contain

Empty element

Note

The attributes available for this element are not appropriate in all cases. For example, it makes no sense to specify the temporal duration of a graphic. Such errors are not currently detected.

The mimeType attribute must be used to specify the MIME media type of the resource specified by the url attribute.

Example

Content model

<content>
 <empty/>
</content>
    ⚓

Schema Declaration

element media
{
   tei_att.global.attribute.xmlid,
   tei_att.global.source.attribute.source,
   tei_att.resourced.attributes,
   attribute mimeType { list { + } },
   empty
}⚓

Appendix A.1.55 <meeting>

<meeting> contains the formalized descriptive title for a meeting or conference, for use in a bibliographic description for an item derived from such a meeting, or as a heading or preamble to publications emanating from it. [3.12.2.2. Titles, Authors, and Editors]
Module	core — Formal specification
Attributes	att.global xml:id xml:base xml:space @n @xml:lang att.global.analytic @ana att.global.linking synch next prev @corresp
Contained by	header: titleStmt
May contain	Character data only
Example	The specification of the particular sessions that the corpus or corpus component contains are encoded with <meeting>: <meeting n="7" corresp="#DZ" ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting> <meeting n="8" corresp="#DZ" ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element meeting { tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.global.linking.attribute.corresp, text }⚓

Appendix A.1.56 <name>

<name> (name, proper noun) contains a proper noun or noun phrase. [3.6.1. Referring Strings]

Module

core — Formal specification

Attributes

att.global
- n
- xml:base
- xml:space
- @xml:id
- @xml:lang
att.global.analytic
- @ana
att.personal
- @full
att.canonical
- @key
- @ref
att.typed
- type
- @subtype

type

Status	Optional
Legal values are:	PER LOC ORG MISC city country address org place

Member of

model.nameLike.agent

Contained by

analysis: s

core: name unit

corpus: setting

header: change

namesdates: placeName

May contain

analysis: pc w

core: date gap name note num pb time

spoken: incident kinesic vocal

character data

Note

Proper nouns referring to people, places, and organizations may be tagged instead with <persName>, <placeName>, or <orgName>, when the TEI module for names and dates is included.

Example

The element is used to mark up Named Entities in the linguistically analysed corpus, in which case it should have the type attribute with one of the allowed values. It can also have a ref attribute to link it a definition:

... <w lemma="and" msd="UPosTag=CCONJ">and</w> <name type="ORG" ref="https://en.wikipedia.org/wiki/Westminster"> <w join="right" lemma="Westminster" msd="UPosTag=PROPN|Number=Sing">Westminster</w> </name> <w lemma="," msd="UPosTag=PUNCT">,</w> ...

Example

Element <name> is used in the TEI header to specify the location of the parliament:

<name type="place">Westminster</name> <name type="city">London</name> <name type="country" key="GB">U.K.</name>

Example

The element is used in the TEI header to denote person's responsibility for changes:

<revisionDesc> <change when="2021-06-11"> <name>Tomaž Erjavec</name>: Finalized encoding.</change> <change when="2021-05-28"> <name>Tomaž Erjavec</name>: Built corpus.</change> </revisionDesc>

Content model

<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <elementRef key="name"/>
  <elementRef key="date"/>
  <elementRef key="num"/>
  <elementRef key="time"/>
  <elementRef key="note"/>
  <elementRef key="vocal"/>
  <elementRef key="kinesic"/>
  <elementRef key="incident"/>
  <elementRef key="gap"/>
  <elementRef key="pb"/>
  <textNode/>
 </alternate>
</content>
    ⚓

Schema Declaration

element name
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.personal.attribute.full,
   tei_att.canonical.attribute.key,
   tei_att.canonical.attribute.ref,
   tei_att.typed.attribute.subtype,
   attribute type
   {
      "PER"
    | "LOC"
    | "ORG"
    | "MISC"
    | "city"
    | "country"
    | "address"
    | "org"
    | "place"
   }?,
   (
      tei_w
    | tei_pc
    | tei_name
    | tei_date
    | tei_num
    | tei_time
    | tei_note
    | tei_vocal
    | tei_kinesic
    | tei_incident
    | tei_gap
    | tei_pb
    | text
   )+
}⚓

Appendix A.1.57 <nameLink>

<nameLink> (name link) contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of. [14.2.1. Personal Names]
Module	namesdates — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Member of	model.persNamePart
Contained by	namesdates: persName
May contain	Character data only
Example	<person xml:id="PicóAntoni"> <persName> <forename>Antoni</forename> <surname>Picó</surname> <nameLink>i</nameLink> <surname>Azanza</surname> </persName> ... </person>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element nameLink { tei_att.global.attribute.xmllang, text }⚓

Appendix A.1.58 <namespace>

<namespace> (namespace) supplies the formal name of the namespace to which the elements documented by its children belong. [2.3.4. The Tagging Declaration]

Module

header — Formal specification

Attributes

name

Status	Required
Legal values are:	http://www.tei-c.org/ns/1.0

Contained by

header: tagsDecl

May contain

header: tagUsage

Example

To distinguish the TEI elements from the possible use of elements from other namespaces, a <namespace> element giving the TEI namespace is introduced first:

Content model

<content>
 <elementRef key="tagUsage" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element namespace
{
   attribute name { "http://www.tei-c.org/ns/1.0" },
   tei_tagUsage+
}⚓

Appendix A.1.59 <normalization>

<normalization> (normalization) indicates the extent of normalization or regularization of the original source carried out in converting it to electronic form. [2.3.3. The Editorial Practices Declaration 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: editorialDecl
May contain	core: p
Example	<editorialDecl> ... <normalization> <p xml:lang="en">Text has not been normalised, except for spacing.</p> </normalization> ... </editorialDecl>
Content model	<content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element normalization { tei_p+ }⚓

Appendix A.1.60 <note>

<note> (note) contains a note or annotation. [3.9.1. Notes and Simple Annotation 2.2.6. The Notes Statement 3.12.2.8. Notes and Statement of Language 10.3.5.4. Notes within Entries]

Module

core — Formal specification

Attributes

att.global
- xml:base
- xml:space
- @xml:id
- @n
- @xml:lang
att.global.linking
- synch
- next
- prev
- @corresp
att.typed
- type
- @subtype

type

Status

Recommended

Sample values include:

narrative: Description in the third person of events taking place in the meeting, e.g. "Mr X. takes the Chair".
summary: Summaries of speeches that are individually not interesting, e.g. "Question put and agreed to".
speaker: Name, role and possible description of a person doing the speech
vote: Outcome of a vote
location: The location of the speaker, who was not on the podium
date: Date of the session
president: Chairman of a meeting
comment: Comment of parliamentary reporter
time: Date and time of the beginning and end of the debate
quorum: The presence of the members of parliament
debate: Comments on the conduct of debates

Member of

model.noteLike

Contained by

analysis: s

core: name unit

linking: seg

namesdates: affiliation state

spoken: u

textstructure: div

May contain

core: pb time

character data

Example

<note> element is used to encode transcriber comments such as who spoke, what the time was, interruptions, notes on what is happening in the chamber, results of voting etc.:

<note type="speaker">The president, Dr. Milan Brglez:</note> ... <note type="time">The session began at 10 o'clock.</note> ... <note type="vote-ayes">84 voted for the adoption of the measure.</note> ... <note type="vote-noes">2 voted against the adoption of the measure.</note> ...

Example

The <note> element can be further qualified by the <time> element to specify the date and time recorded in the note; and can also contain a page break, <pb>:

<note type="time">The session began <pb/> at <time when="2016-04-13T010:00:00">10 o'clock</time>.</note>

Example

The <note> element may also be used to mark any additional information on debate sections:

<div type="debateSection"> <head>Business Before Questions</head> <note>Death of a Member</note> <u xml:id="ParlaMint-GB_2019-02-18-commons.u1">...</u> ... <note>End of debateSection.</note> </div>

Content model

<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <elementRef key="pb"/>
  <elementRef key="time"/>
 </alternate>
</content>
    ⚓

Schema Declaration

element note
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.typed.attribute.subtype,
   attribute type { text }?,
   ( text | tei_pb | tei_time )*
}⚓

Appendix A.1.61 <num>

<num> (number) contains a number, written in any form. [3.6.3. Numbers and Measures]

Module

core — Formal specification

Attributes

att.global
- n
- xml:base
- xml:space
- @xml:id
- @xml:lang
att.global.analytic
- @ana
att.typed
- type
- @subtype

type

indicates the type of numeric value.

Derived from	att.typed
Status	Optional
Datatype	teidata.enumerated
Suggested values include:	cardinal absolute number, e.g. 21, 21.5 ordinal ordinal number, e.g. 21st fraction fraction, e.g. one half or three-quarters percentage a percentage
Note	If a different typology is desired, other values can be used for this attribute.

Member of

model.measureLike

Contained by

analysis: s

core: name unit

May contain

analysis: pc w

character data

Note

Detailed analyses of quantities and units of measure in historical documents may also use the feature structure mechanism described in chapter 19. Feature Structures. The <num> element is intended for use in simple applications.

Example

The element can be used for fine-grained Named Entities which include numbers:

Content model

<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <textNode/>
 </alternate>
</content>
    ⚓

Schema Declaration

element num
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.typed.attribute.subtype,
   attribute type { "cardinal" | "ordinal" | "fraction" | "percentage" }?,
   ( tei_w | tei_pc | text )+
}⚓

Appendix A.1.62 <occupation>

<occupation> (occupation) contains an informal description of a person's trade, profession or occupation. [16.2.2. The Participant Description]
Module	namesdates — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang att.datable.w3c notBefore notAfter @when @from @to
Contained by	namesdates: person
May contain	Character data only
Note	The content of this element may be used as an alternative to the more formal specification made possible by its attributes; it may also be used to supplement the formal specification with commentary or clarification.
Example	<person n="2678" xml:id="SimeonovValeri"> <persName xml:lang="bg"> <forename>Валери</forename> <surname>Симеонов</surname> </persName> <sex value="M"/> <birth when="1955-03-14"> <placeName>Долни Чифлик, България</placeName> </birth> <education>инженер</education> <occupation>политик</occupation> ... </person>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element occupation { tei_att.global.attribute.xmllang, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, text }⚓

Appendix A.1.63 <org>

<org> (organization) provides information about an identifiable organisation such as the government, political party, ministry etc. [14.3.3. Organizational Data]

Module

namesdates — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang
att.global.analytic
- @ana

xml:id

(identifier) provides a unique identifier for the element bearing the attribute.

Derived from	att.global
Status	Required
Datatype	ID

role

Status

Required

Legal values are:

country
federatedState
republic
government
ministry
parliament
politicalParty
parliamentaryGroup
conferenceOfChairs
boardOfParliament
ngo
institution
senate
committee
subcommittee
commission
delegation
supervisoryBoard
workingGroup
interparliamentaryFriendshipGroup
nationalCouncil
chamberOfThePeople
chamberOfTheNations
europeanCommission
europeanParliament
europeanInstitution
internationalOrganisation
boardOfDirectors
ethnicCommunity

Contained by

namesdates: listOrg

May contain

core: desc head

header: idno

namesdates: event listEvent orgName state

Example

<org xml:id="government.BE" role="government"> <orgName xml:lang="en" full="yes">Federal Government of Belgium</orgName> <orgName xml:lang="nl" full="yes">Federale regering</orgName> <orgName xml:lang="fr" full="yes">Gouvernement fédéral</orgName> </org> <org ana="#parla.federal #parla.lower" role="parliament" xml:id="be_federal_parliament"> <orgName full="yes" xml:lang="nl">Federaal Parlement van België</orgName> <orgName full="yes" xml:lang="en">Belgian Federal Parliament</orgName> <event from="1831-02-07"> <label xml:lang="en">existence</label> </event> ... </org>

Example

<org xml:id="party.PS2" role="parliamentaryGroup"> <orgName full="yes" xml:lang="sl">Pozitivna Slovenija</orgName> <orgName full="yes" xml:lang="en">Positive Slovenia</orgName> <orgName full="abb">PS</orgName> <event from="2011-10-22"> <label xml:lang="en">existence</label> </event> <idno type="URI" xml:lang="sl" subtype="wikimedia">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno> <idno type="URI" xml:lang="en" subtype="wikimedia">https://en.wikipedia.org/wiki/Positive_Slovenia</idno> </org>

Content model

<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="head" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="orgName" minOccurs="1"
   maxOccurs="unbounded"/>
  <elementRef key="event" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="idno" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="desc" minOccurs="0"
   maxOccurs="1"/>
  <elementRef key="listEvent" minOccurs="0"
   maxOccurs="1"/>
  <elementRef key="state" minOccurs="0"
   maxOccurs="unbounded"/>
 </sequence>
</content>
    ⚓

Schema Declaration

element org
{
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   attribute xml:id { text },
   attribute role
   {
      "country"
    | "federatedState"
    | "republic"
    | "government"
    | "ministry"
    | "parliament"
    | "politicalParty"
    | "parliamentaryGroup"
    | "conferenceOfChairs"
    | "boardOfParliament"
    | "ngo"
    | "institution"
    | "senate"
    | "committee"
    | "subcommittee"
    | "commission"
    | "delegation"
    | "supervisoryBoard"
    | "workingGroup"
    | "interparliamentaryFriendshipGroup"
    | "nationalCouncil"
    | "chamberOfThePeople"
    | "chamberOfTheNations"
    | "europeanCommission"
    | "europeanParliament"
    | "europeanInstitution"
    | "internationalOrganisation"
    | "boardOfDirectors"
    | "ethnicCommunity"
   },
   (
      tei_head*,
      tei_orgName+,
      tei_event*,
      tei_idno*,
      tei_desc?,
      tei_listEvent?,
      tei_state*
   )
}⚓

Appendix A.1.64 <orgName>

<orgName> (organization name) contains an organisational name. [14.2.2. Organizational Names]

Module

namesdates — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang
att.canonical
- key
- @ref

from

indicates the starting point of the period in standard form, e.g. yyyy-mm-dd.

Derived from	att.datable.w3c
Status	Optional
Datatype	teidata.temporal.w3c
Note	Used when "the same" party changes its name

indicates the ending point of the period in standard form, e.g. yyyy-mm-dd.

Derived from	att.datable.w3c
Status	Optional
Datatype	teidata.temporal.w3c
Note	Used when "the same" party changes its name

full

Status	Optional
Legal values are:	yes abb

Member of

model.nameLike.agent

Contained by

core: label publisher

header: funder

namesdates: affiliation org

May contain

Character data only

Example

<funder> <orgName xml:lang="en">The CLARIN research infrastructure</orgName> <orgName xml:lang="sl">Raziskovalna infrastruktura CLARIN</orgName> </funder>

Example

<org xml:id="party.PS1" role="parliamentaryGroup"> <orgName full="yes" xml:lang="en">Positive Slovenia</orgName> <orgName full="yes" xml:lang="sl">Pozitivna Slovenija</orgName> <orgName full="abb" xml:lang="sl">PS</orgName> </org>

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element orgName
{
   tei_att.global.attribute.xmllang,
   tei_att.canonical.attribute.ref,
   attribute from { text }?,
   attribute to { text }?,
   attribute full { "yes" | "abb" }?,
   text
}⚓

Appendix A.1.65 <p>

<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents]
Module	core — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Member of	model.pLike
Contained by	header: availability correction hyphenation normalization prefixDef projectDesc quotation segmentation spoken: equipment
May contain	core: ref character data
Example	<projectDesc> <p> <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> </p> </projectDesc>
Example	<availability status="free"> <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p>This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p> <p>This work is also licensed under the <ref target="https://www.parliament.uk/site-information/copyright-parliament/open-parliament-licence/">Open Parliament Licence v3.0</ref>.</p> </availability>
Schematron	<sch:rule context="tei:p"> <sch:report test="(ancestor::tei:ab or ancestor::tei:p) and not( ancestor::tei:floatingText \| parent::tei:exemplum \| parent::tei:item \| parent::tei:note \| parent::tei:q \| parent::tei:quote \| parent::tei:remarks \| parent::tei:said \| parent::tei:sp \| parent::tei:stage \| parent::tei:cell \| parent::tei:figure )"> Abstract model violation: Paragraphs may not occur inside other paragraphs or ab elements. </sch:report> </sch:rule>
Schematron	<sch:rule context="tei:l//tei:p"> <sch:assert test="ancestor::tei:floatingText \| parent::tei:figure \| parent::tei:note"> Abstract model violation: Metrical lines may not contain higher-level structural elements such as div, p, or ab, unless p is a child of figure or note, or is a descendant of floatingText. </sch:assert> </sch:rule>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="ref"/> <textNode/> </alternate> </content> ⚓
Schema Declaration	element p { tei_att.global.attribute.xmllang, ( tei_ref \| text )+ }⚓

Appendix A.1.66 <particDesc>

<particDesc> (participation description) describes the identifiable speakers and organisations in a ParlaMint corpus. This informations is given in the corpus root teiHeder. Note that the listPerson and listOrg elements are typically stored in separate files. [16.2. Contextual Information]
Module	corpus — Formal specification
Contained by	header: profileDesc
May contain	derived-module-parlamint: include namesdates: listOrg listPerson
Note	May contain a prose description organized as paragraphs, or a structured list of persons and person groups, with an optional formal specification of any relationships amongst them.
Example	<particDesc> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-listOrg.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-listPerson.xml"/> </particDesc>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="listOrg"/> <elementRef key="include"/> </alternate> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="listPerson"/> <elementRef key="include"/> </alternate> </sequence> </content> ⚓
Schema Declaration	element particDesc { ( tei_listOrg \| tei_include ), ( tei_listPerson \| tei_include ) }⚓

Appendix A.1.67 <pb>

<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.11.3. Milestone Elements]
Module	core — Formal specification
Attributes	att.global xml:lang xml:base xml:space @xml:id @n att.global.linking synch next prev @corresp att.global.source @source
Member of	model.milestoneLike
Contained by	analysis: phr s core: name note linking: seg spoken: u textstructure: div
May contain	Empty element
Note	A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself. The type attribute may be used to characterize the page beginning in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the page beginning is word-breaking, or to note the source from which it derives.
Example	<body> <div type="debateSection"> <pb source="https://www.psp.cz/eknih/2013ps/stenprot/017schuz/s017357.htm" n="1" xml:id="ParlaMint-CZ_2014-10-01-ps2013-017-09-003-036.pb1" corresp="#ps2013-017-09-003-036.audio1"/> ... </div> </body>
Content model	<content> <empty/> </content> ⚓
Schema Declaration	element pb { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.linking.attribute.corresp, tei_att.global.source.attribute.source, empty }⚓

Appendix A.1.68 <pc>

<pc> (punctuation character) contains a character or string of characters regarded as constituting a single punctuation mark. [18.1.2. Below the Word Level 18.4.2. Lightweight Linguistic Annotation]

Module

analysis — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang
att.global.analytic
- @ana
att.linguistic
- lemma
- msd
- @pos
- @join
att.lexicographic.normalized
- @norm
att.segLike
- @function

xml:id

Status	Required
Datatype	ID

msd

Status	Required
Datatype	teidata.text

Member of

model.segLike

Contained by

analysis: phr s

core: date email name num time unit

May contain

Character data only

Example

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element pc
{
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.linguistic.attribute.pos,
   tei_att.linguistic.attribute.join,
   tei_att.lexicographic.normalized.attribute.norm,
   tei_att.segLike.attribute.function,
   attribute xml:id { text },
   attribute msd { text },
   text
}⚓

Appendix A.1.69 <persName>

<persName> (personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc. [14.2.1. Personal Names]
Module	namesdates — Formal specification
Attributes	att.global n xml:base xml:space @xml:id @xml:lang att.datable.w3c when notBefore notAfter @from @to att.canonical key @ref
Member of	model.nameLike.agent
Contained by	core: respStmt namesdates: person
May contain	core: term namesdates: addName forename nameLink roleName surname character data
Note	Special persons (like 'anonymous', 'group' etc.) have their name in <term>.
Example	<persName> <surname>Broekers-Knol</surname> <forename>Ankie</forename> </persName>
Example	<respStmt> <persName>Matthew Coole</persName> <resp>TEI corpus encoding</resp> </respStmt>
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="forename" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="addName" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="nameLink" minOccurs="0" maxOccurs="1"/> <elementRef key="roleName" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="surname" minOccurs="1" maxOccurs="unbounded"/> </alternate> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="term"/> </alternate> <alternate minOccurs="1" maxOccurs="1"> <textNode/> </alternate> </alternate> </content> ⚓
Schema Declaration	element persName { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.canonical.attribute.ref, ( ( tei_forename+ \| tei_addName* \| tei_nameLink? \| tei_roleName* \| tei_surname+ )+ \| tei_term+ \| ( text ) ) }⚓

Appendix A.1.70 <person>

<person> (person) provides information about a speaker in the corpus, at the very least their name and sex. [14.3.2. The Person Element 16.2.2. The Participant Description]
Module	namesdates — Formal specification
Attributes	att.global xml:base xml:space @xml:id @n @xml:lang
Contained by	namesdates: listPerson
May contain	figures: figure header: idno namesdates: affiliation birth death education occupation persName sex
Note	May contain either a prose description organized as paragraphs, or a sequence of more specific demographic elements drawn from the model.personPart class.
Example	<person xml:id="AliciaKearns"> <persName> <forename>Alicia</forename> <forename>Alexandra Martha</forename> <surname>Kearns</surname> </persName> <sex value="F"/> <affiliation from="2019-12-12" ref="#parla.lower" role="member"/> <affiliation from="2019-12-12" ref="#party.CON" role="member"/> <idno subtype="contact" type="URI">https://members.parliament.uk/member/4805/contact</idno> </person>
Example	<person xml:id="AdamowiczPiotr"> <persName> <forename>Piotr</forename> <surname>Adamowicz</surname> </persName> <birth when="1961-06-26">26.06.1961</birth> <sex value="M"/> <affiliation role="member" ref="#party.KO"/> </person>
Content model	<content> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="persName" minOccurs="1" maxOccurs="unbounded"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="sex" minOccurs="1" maxOccurs="1"/> <elementRef key="birth" minOccurs="0" maxOccurs="1"/> <elementRef key="death" minOccurs="0" maxOccurs="1"/> <elementRef key="affiliation" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="occupation" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="education" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="idno" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="figure" minOccurs="0" maxOccurs="unbounded"/> </alternate> </sequence> </content> ⚓
Schema Declaration	element person { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, ( tei_persName+, ( tei_sex \| tei_birth? \| tei_death? \| tei_affiliation* \| tei_occupation* \| tei_education* \| tei_idno* \| tei_figure* )+ ) }⚓

Appendix A.1.71 <phr>

<phr> (phrase) contains a semantic multi-word unit. [18.1. Linguistic Segment Categories]

Module

analysis — Formal specification

Attributes

att.global
- n
- xml:base
- xml:space
- @xml:id
- @xml:lang

ana

(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.

Derived from	att.global.analytic
Status	Required
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

function

(function) characterizes the function of the segment.

Derived from	att.segLike
Status	Required
Datatype	teidata.enumerated

type

Status	Optional
Legal values are:	sem

Member of

model.segLike

Contained by

analysis: s

May contain

analysis: pc w

core: pb

character data

Note

The type attribute may be used to indicate the type of phrase, taking values such as noun, verb, preposition, etc. as appropriate.

Example

The element is used to mark multi-word units (MWEs) which have a semantic interpretation. The type should be set to sem. The MWE should be marked with the function (all semantic tags) and ana (semantic categories) attributes:

... ... <phr type="sem" function="Z4" ana="sem:Z4"> <w pos="IN" msd="UPosTag=ADP" lemma="on" function="Z4" ana="sem:Z4">On</w> <w pos="DT" msd="UPosTag=DET|Definite=Def|PronType=Art" lemma="the" function="Z4" ana="sem:Z4">the</w> <w pos="JJ" msd="UPosTag=ADJ|Degree=Pos" lemma="other" function="Z4" ana="sem:Z4">other</w> <w pos="NN" msd="UPosTag=NOUN|Number=Sing" lemma="hand" function="Z4" ana="sem:Z4" join="right">hand</w> </phr> ...

Content model

<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <elementRef key="pb"/>
  <textNode/>
 </alternate>
</content>
    ⚓

Schema Declaration

element phr
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   attribute ana { list { + } },
   attribute function { text },
   attribute type { "sem" }?,
   ( tei_w | tei_pc | tei_pb | text )+
}⚓

Appendix A.1.72 <placeName>

<placeName> (place name) contains a place name. [14.2.3. Place Names]
Module	namesdates — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang att.canonical key @ref
Member of	model.placeNamePart
Contained by	namesdates: birth death
May contain	core: name character data
Example	<placeName ref="https://www.geonames.org/2523918">Palermo</placeName>
Example	<placeName>Tours-Saint-Symphorien, Indre-et-Loire</placeName>
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="name" minOccurs="0" maxOccurs="1"/> <textNode/> </alternate> </content> ⚓
Schema Declaration	element placeName { tei_att.global.attribute.xmllang, tei_att.canonical.attribute.ref, ( tei_name? \| text ) }⚓

Appendix A.1.73 <prefixDef>

<prefixDef> (prefix definition) defines a prefixing scheme used in teidata.pointer values, showing how abbreviated URIs using the scheme may be expanded into full URIs. [17.2.3. Using Abbreviated Pointers]

Module

header — Formal specification

Attributes

matchPattern

specifies a regular expression against which the values of other attributes can be matched.

Derived from	att.patternReplacement
Status	Required
Datatype	teidata.pattern

replacementPattern

specifies a ‘replacement pattern’, that is, the skeleton of a relative or absolute URI containing references to groups in the matchPattern which, once subpattern substitution has been performed, complete the URI.

Derived from	att.patternReplacement
Status	Required
Datatype	teidata.replacement
Note	Using TEI-defined XPointer schemes is not allowed.

ident

supplies a name which functions as the prefix for an abbreviated pointing scheme such as a private URI scheme. The prefix constitutes the text preceding the first colon.

Status	Required
Datatype	teidata.prefix
Note	The value is limited to teidata.prefix so that it may be mapped directly to a URI prefix.

Contained by

header: listPrefixDef

May contain

core: p

Note

The abbreviated pointer may be dereferenced to produce either an absolute or a relative URI reference. In the latter case it is combined with the value of xml:base in force at the place where the pointing attribute occurs to form an absolute URI in the usual manner as prescribed by XML Base.

Example

<prefixDef ident="mte" matchPattern="(.+)" replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-hbs.xml#$1"> <p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Serbocroatian MULTEXT-East Version 6 MSDs.</p> </prefixDef>

Content model

<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element prefixDef
{
   attribute matchPattern { text },
   attribute replacementPattern { text },
   attribute ident { text },
   tei_p+
}⚓

Appendix A.1.74 <profileDesc>

<profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. [2.4. The Profile Description 2.1.1. The TEI Header and Its Components]
Module	header — Formal specification
Contained by	header: teiHeader
May contain	corpus: particDesc settingDesc header: langUsage textClass
Note	Although the content model permits it, it is rarely meaningful to supply multiple occurrences for any of the child elements of <profileDesc> unless these are documenting multiple texts.
Example	General structure of the element <profileDesc>: <profileDesc> <settingDesc>...</settingDesc> <textClass>...</textClass> <particDesc>...</particDesc> <langUsage>...</langUsage> </profileDesc>
Example	Profile description of a corpus root: <profileDesc> <settingDesc> <setting> <name type="address">Šubičeva ulica 4</name> <name type="city">Ljubljana</name> <name type="country" key="SI">Slovenia</name> <date from="2014-08-01" to="2020-07-16">1.8.2014 - 16.7.2020</date> </setting> </settingDesc> <textClass> <textClass> <catRef scheme="#parla.legislature" target="#parla.bi #parla.lower"/> </textClass> </textClass> <particDesc> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-listOrg.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-listPerson.xml"/> </particDesc> <langUsage> <langUsage> <language ident="sl" xml:lang="sl">slovenski</language> <language ident="en" xml:lang="sl">angleški</language> <language ident="sl" xml:lang="en">Slovenian</language> <language ident="en" xml:lang="en">English</language> </langUsage> </langUsage> </profileDesc>
Example	Profile description for a corpus component. In contrast to the corpus root, only the first, the <settingDesc> is used in corpus components. <profileDesc> <settingDesc> <setting> <name type="city">Ljubljana</name> <name type="country" key="SI">Slovenija</name> <date when="2014-08-28" ana="#parla.sitting">28.8.2014</date> </setting> </settingDesc> </profileDesc>
Content model	<content> <elementRef key="settingDesc"/> <elementRef key="textClass" minOccurs="0" maxOccurs="1"/> <elementRef key="particDesc" minOccurs="0" maxOccurs="1"/> <elementRef key="langUsage" minOccurs="0" maxOccurs="1"/> </content> ⚓
Schema Declaration	element profileDesc { tei_settingDesc, tei_textClass?, tei_particDesc?, tei_langUsage? }⚓

Appendix A.1.75 <projectDesc>

<projectDesc> (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected. [2.3.1. The Project Description 2.3. The Encoding Description 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: encodingDesc
May contain	core: p
Example	<projectDesc> <p xml:lang="sl">Glavni cilji projekta <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> so (1) izdelati večjezično množico na enak način kodiranih korpusov zapiskov parlamentarnih sej, ...</p> <p xml:lang="en">The <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> project aims to (1) create a multilingual set of uniformly encoded comparable corpora of parliamentary proceedings, ...</p> </projectDesc>
Content model	<content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element projectDesc { tei_p+ }⚓

Appendix A.1.76 <pubPlace>

<pubPlace> (publication place) contains the name of the place where a bibliographic item was published. [3.12.2.4. Imprint, Size of a Document, and Reprint Information]
Module	core — Formal specification
Contained by	header: publicationStmt
May contain	core: ref character data
Example	<pubPlace> <ref target="https://github.com/clarin-eric/ParlaMint">https://github.com/clarin-eric/ParlaMint</ref> </pubPlace>
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <elementRef key="ref"/> <textNode/> </alternate> </content> ⚓
Schema Declaration	element pubPlace { tei_ref \| text }⚓

Appendix A.1.77 <publicationStmt>

<publicationStmt> (publication statement) groups information concerning the publication or distribution of an electronic or other text. [2.2.4. Publication, Distribution, Licensing, etc. 2.2. The File Description]
Module	header — Formal specification
Contained by	header: fileDesc
May contain	core: date pubPlace publisher header: availability idno
Note	Where a publication statement contains several members of the model.publicationStmtPart.agency or model.publicationStmtPart.detail classes rather than one or more paragraphs or anonymous blocks, care should be taken to ensure that the repeated elements are presented in a meaningful order. It is a conformance requirement that elements supplying information about publication place, address, identifier, availability, and date be given following the name of the publisher, distributor, or authority concerned, and preferably in that order.
Example	<publicationStmt> <publisher> <orgName xml:lang="sl">Raziskovalna infrastrukutra CLARIN</orgName> <orgName xml:lang="en">CLARIN research infrastructure</orgName> <ref target="https://www.clarin.eu/">www.clarin.eu</ref> </publisher> <idno type="URI" subtype="handle">http://hdl.handle.net/11356/1432</idno> <availability status="free"> <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="sl">To delo je ponujeno pod <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Priznanje avtorstva 4.0 mednarodna licenca</ref>.</p> <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p> </availability> <date when="2021-06-11">11. 6. 2021</date> </publicationStmt>
Content model	<content> <elementRef key="publisher"/> <elementRef key="idno"/> <elementRef key="pubPlace" minOccurs="0" maxOccurs="1"/> <elementRef key="availability"/> <elementRef key="date"/> </content> ⚓
Schema Declaration	element publicationStmt { tei_publisher, tei_idno, tei_pubPlace?, tei_availability, tei_date }⚓

Appendix A.1.78 <publisher>

<publisher> (publisher) provides the name of the organisation responsible for the publication or distribution of a bibliographic item. [3.12.2.4. Imprint, Size of a Document, and Reprint Information 2.2.4. Publication, Distribution, Licensing, etc.]
Module	core — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Contained by	core: bibl header: publicationStmt
May contain	core: ref namesdates: orgName character data
Note	Use the full form of the name by which a company is usually referred to, rather than any abbreviation of it which may appear on a title page
Example	<publisher> <orgName>CLARIN research infrastructure</orgName> <ref target="https://www.clarin.eu/">www.clarin.eu</ref> </publisher>
Content model	<content> <alternate minOccurs="1" maxOccurs="1"> <sequence minOccurs="1" maxOccurs="1"> <elementRef key="orgName" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="ref" minOccurs="0" maxOccurs="unbounded"/> </sequence> <textNode/> </alternate> </content> ⚓
Schema Declaration	element publisher { tei_att.global.attribute.xmllang, ( ( tei_orgName+, tei_ref* ) \| text ) }⚓

Appendix A.1.79 <quotation>

<quotation> (quotation) specifies editorial practice adopted with respect to quotation marks in the original. [2.3.3. The Editorial Practices Declaration 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: editorialDecl
May contain	core: p
Example	<editorialDecl> ... <quotation> <p xml:lang="en">Quotation marks have been left in the text and are not explicitly marked up.</p> </quotation> </editorialDecl>
Schematron	<sch:rule context="tei:quotation"> <sch:report test="not( @marks ) and not( tei:p )"> On <sch:name/>, either the @marks attribute should be used, or a paragraph of description provided </sch:report> </sch:rule>
Content model	<content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element quotation { tei_p+ }⚓

Appendix A.1.80 <recording>

<recording> (recording event) provides details of an audio or video recording event used as the source of a spoken text, either directly or from a public broadcast. [8.2. Documenting the Source of Transcribed Speech 16.3.2. Declarable Elements]

Module

spoken — Formal specification

Attributes

type

the kind of recording.

Derived from	att.typed
Status	Optional
Datatype	teidata.enumerated
Legal values are:	audio audio recording[Default] video audio and video recording

Contained by

spoken: recordingStmt

May contain

core: media

Note

The dur attribute is used to indicate the original duration of the recording.

Example

Content model

<content>
 <elementRef key="media" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element recording { attribute type { "audio" | "video" }?, tei_media+ }⚓

Appendix A.1.81 <recordingStmt>

<recordingStmt> (recording statement) describes a set of recordings used as the basis for transcription of a spoken text. [8.2. Documenting the Source of Transcribed Speech 2.2.7. The Source Description]
Module	spoken — Formal specification
Contained by	header: sourceDesc
May contain	spoken: recording
Example	<recordingStmt> <recording type="audio"> <media xml:id="ps2017-020-09-004-010.audio1" mimeType="audio/mp3" source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318081822.mp3" url="2017ps/audio/2018/11/13/2018111318081822.mp3"/> <media xml:id="ps2017-020-09-004-010.audio2" mimeType="audio/mp3" source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318181832.mp3" url="2017ps/audio/2018/11/13/2018111318181832.mp3"/> <media xml:id="ps2017-020-09-004-010.audio3" mimeType="audio/mp3" source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318281842.mp3" url="2017ps/audio/2018/11/13/2018111318281842.mp3"/> ... </recording> </recordingStmt>
Content model	<content> <elementRef key="recording" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element recordingStmt { tei_recording+ }⚓

Appendix A.1.82 <ref>

<ref> (reference) defines a reference to another location, possibly modified by additional text or comment. [3.7. Simple Links and Cross-References 17.1. Links]

Module

core — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang

target

specifies the destination of the reference by supplying one or more URI References.

Derived from	att.pointing
Status	Recommended
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Member of

model.ptrLike

Contained by

core: desc p pubPlace publisher unit

header: catDesc funder

May contain

Character data only

Note

The target and cRef attributes are mutually exclusive.

Example

<projectDesc> <p> <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a project that aims to create a multilingual set of comparable corpora of parliamentary proceedings uniformly encoded according to the <ref target="https://github.com/clarin-eric/parla-clarin">Parla-CLARIN recommendations</ref> and ...</p> </projectDesc>

Schematron

<sch:rule context="tei:ref"> <sch:report test="@target and @cRef">Only one of the attributes @target and @cRef may be supplied on <sch:name/>.</sch:report> </sch:rule>

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element ref
{
   tei_att.global.attribute.xmllang,
   attribute target { list { + } }?,
   text
}⚓

Appendix A.1.83 <relation>

<relation> (relationship) describes a relationship between two organisations. [14.3.2.3. Personal Relationships]

Module

namesdates — Formal specification

Attributes

att.global.analytic
- @ana
att.datable.w3c
- notBefore
- notAfter
- @when
- @from
- @to

name

Status	Required
Legal values are:	coalition opposition renaming successor representing

active

identifies the ‘active’ participants in a non-mutual relationship, or all the participants in a mutual one.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

mutual

supplies a list of participants amongst all of whom the relationship holds equally.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

passive

identifies the ‘passive’ participants in a non-mutual relationship.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Contained by

namesdates: listRelation

May contain

Empty element

Note

Only one of the attributes active and mutual may be supplied; the attribute passive may be supplied only if the attribute active is supplied. Not all of these constraints can be enforced in all schema languages.

Example

Specification of coalition and opposition political parties (or parliamentary groups) in a given time period and legislative period:

Example

Specification of parliamentary group representing political parties in the parliament:

Schematron

<sch:rule context="tei:relation"> <sch:assert test="@ref or @key or @name">One of the attributes @name, @ref or @key must be supplied</sch:assert> </sch:rule>

Schematron

<sch:rule context="tei:relation"> <sch:report test="@active and @mutual">Only one of the attributes @active and @mutual may be supplied</sch:report> </sch:rule>

Schematron

<sch:rule context="tei:relation"> <sch:report test="@passive and not(@active)">the attribute @passive may be supplied only if the attribute @active is supplied</sch:report> </sch:rule>

Content model

<content>
 <empty/>
</content>
    ⚓

Schema Declaration

element relation
{
   tei_att.global.analytic.attribute.ana,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   attribute name
   {
      "coalition" | "opposition" | "renaming" | "successor" | "representing"
   },
   ( attribute active { list { + } }? | attribute mutual { list { + } }? ),
   attribute passive { list { + } }?,
   empty
}⚓

Appendix A.1.84 <resp>

<resp> (responsibility) contains a phrase describing the nature of a person's intellectual responsibility, or an organisation's role in the production or distribution of a work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement]
Module	core — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Contained by	core: respStmt
May contain	Character data only
Note	The attribute ref, inherited from the class att.canonical may be used to indicate the kind of responsibility in a normalized form by referring directly to a standardized list of responsibility types, such as that maintained by a naming authority, for example the list maintained at http://www.loc.gov/marc/relators/relacode.html for bibliographic usage.
Example	<respStmt> <persName>Andrej Pančur</persName> <resp>Kodiranje TEI</resp> <resp xml:lang="en">TEI corpus encoding</resp> </respStmt>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element resp { tei_att.global.attribute.xmllang, text }⚓

Appendix A.1.85 <respStmt>

<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organisations which have played a role in the production or distribution of a bibliographic work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement]
Module	core — Formal specification
Contained by	header: titleStmt
May contain	core: resp namesdates: persName
Example	<respStmt> <persName>Matthew Coole</persName> <resp>Data retrieval, Parla-CLARIN TEI XML corpus encoding and linguistic annotation.</resp> </respStmt>
Example	<respStmt> <persName ref="https://orcid.org/0000-0003-3063-2239">Tommaso Agnoloni</persName> <persName ref="https://orcid.org/0000-0002-8126-6294">Francesca Frontini</persName> <persName ref="https://orcid.org/0000-0002-2953-8619">Simonetta Montemagni</persName> <persName ref="https://orcid.org/0000-0002-1321-5444">Valeria Quochi</persName> <persName ref="https://orcid.org/0000-0001-5849-0979">Giulia Venturi</persName> <resp xml:lang="it">Definizione del progetto e metodologia</resp> <resp xml:lang="en">Project set-up and methodology</resp> </respStmt> <respStmt> <persName>Manuela Ruisi</persName> <persName>Carlo Marchetti</persName> <persName>Roberto Battistoni</persName> <resp xml:lang="it">Recupero dei dati</resp> <resp xml:lang="en">Data retrieval</resp> </respStmt> <respStmt> <persName>Tommaso Agnoloni</persName> <resp xml:lang="it">Codifica corpus in ParlaMint TEI XML</resp> <resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp> <resp xml:lang="it">Pulizia, normalizzazione e conversione in ParlaMint TEI XML</resp> <resp xml:lang="en">Cleaning, normalisation and conversion to ParlaMint TEI XML</resp> </respStmt> ...
Content model	<content> <elementRef key="persName" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="resp" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element respStmt { tei_persName+, tei_resp+ }⚓

Appendix A.1.86 <revisionDesc>

<revisionDesc> (revision description) summarizes the revision history for a file [2.6. The Revision Description 2.1.1. The TEI Header and Its Components]
Module	header — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Contained by	header: teiHeader
May contain	header: change
Note	If present on this element, the status attribute should indicate the current status of the document. The same attribute may appear on any <change> to record the status at the time of that change. Conventionally <change> elements should be given in reverse date order, with the most recent change at the start of the list.
Example	<revisionDesc> <change when="2021-06-11"> <name>Tomaž Erjavec</name>: Finalized encoding.</change> <change when="2021-05-28"> <name>Tomaž Erjavec</name>: Built corpus.</change> </revisionDesc>
Content model	<content> <elementRef key="change" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element revisionDesc { tei_att.global.attribute.xmllang, tei_change+ }⚓

Appendix A.1.87 <roleName>

<roleName> (role name) contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank. [14.2.1. Personal Names]
Module	namesdates — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Member of	model.persNamePart
Contained by	namesdates: affiliation persName
May contain	Character data only
Note	A <roleName> may be distinguished from an <addName> by virtue of the fact that, like a title, it typically exists independently of its holder.
Example	<persName> <surname>Murgel</surname> <forename>Jasna</forename> <roleName>dr.</roleName> </persName>
Example	<affiliation role="minister" ref="#GOV" from="2020-08-01"> <roleName xml:lang="sl">Minister za obrambo</roleName> <roleName xml:lang="en">Minister of Defence</roleName> </affiliation>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element roleName { tei_att.global.attribute.xmllang, text }⚓

Appendix A.1.88 <s>

<s> (s-unit) contains a sentence-like division of a text. [18.1. Linguistic Segment Categories 8.4.1. Segmentation]
Module	analysis — Formal specification
Attributes	att.global xml:base xml:space @xml:id @n @xml:lang att.global.analytic @ana att.global.linking synch next prev @corresp
Member of	model.segLike
Contained by	linking: seg
May contain	analysis: pc phr w core: date gap measure name note num pb time linking: linkGrp spoken: incident kinesic vocal
Note	The <s> element may be used to mark orthographic sentences, or any other segmentation of a text, provided that the segmentation is end-to-end, complete, and non-nesting. For segmentation which is partial or recursive, the <seg> should be used instead. The type attribute may be used to indicate the type of segmentation intended, according to any convenient typology.
Example	<s xml:id="ParlaMint-GB_2017-10-30-lords.seg4.1"> <w lemma="I" msd="UPosTag=PRON\|Case=Nom\|Number=Sing\|Person=1\|PronType=Prs" pos="PRP">I</w> <w lemma="support" msd="UPosTag=VERB\|Mood=Ind\|Tense=Pres\|VerbForm=Fin" pos="VBP">support</w> <w lemma="the" msd="UPosTag=DET\|Definite=Def\|PronType=Art" pos="DT">the</w> <w lemma="amendment" msd="UPosTag=NOUN\|Number=Sing" pos="NN" join="right">amendment</w> <pc msd="UPosTag=PUNCT" pos=".">.</pc> </s>
Schematron	<sch:rule context="tei:s"> <sch:report test="tei:s">You may not nest one s element within another: use seg instead</sch:report> </sch:rule>
Content model	<content> <elementRef key="measure" minOccurs="0"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="name"/> <elementRef key="phr"/> <elementRef key="num"/> <elementRef key="date"/> <elementRef key="time"/> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> </alternate> <elementRef key="linkGrp" minOccurs="0" maxOccurs="1"/> </content> ⚓
Schema Declaration	element s { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.global.linking.attribute.corresp, tei_measure?, ( tei_w \| tei_pc \| tei_name \| tei_phr \| tei_num \| tei_date \| tei_time \| tei_note \| tei_vocal \| tei_kinesic \| tei_incident \| tei_gap \| tei_pb )+, tei_linkGrp? }⚓

Appendix A.1.89 <seg>

<seg> (arbitrary segment) represents any segmentation of text below the ‘chunk’ level. [17.3. Blocks, Segments, and Anchors 6.2. Components of the Verse Line 7.2.5. Speech Contents]
Module	linking — Formal specification
Attributes	att.global xml:base xml:space @xml:id @n @xml:lang att.global.linking synch next prev @corresp
Member of	model.segLike
Contained by	spoken: u
May contain	analysis: s core: gap measure note pb spoken: incident kinesic vocal character data
Note	The <seg> element may be used at the encoder's discretion to mark any segments of the text of interest for processing. One use of the element is to mark text features for which no appropriate markup is otherwise defined. Another use is to provide an identifier for some segment which is to be pointed at by some other element—i.e. to provide a target, or a part of a target, for a <ptr> or other similar element.
Example	<u who="#DavidPrior" ana="#regular"> <seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg> <seg>The relevant document is the 20th Report from the Legislation Committee.</seg> </u>
Content model	<content> <elementRef key="measure" minOccurs="0"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <elementRef key="s"/> </alternate> </alternate> </content> ⚓
Schema Declaration	element seg { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.linking.attribute.corresp, tei_measure?, ( tei_note \| tei_vocal \| tei_kinesic \| tei_incident \| tei_gap \| tei_pb \| ( text \| tei_s )* )+ }⚓

Appendix A.1.90 <segmentation>

<segmentation> (segmentation) describes the principles according to which the text has been segmented, for example into sentences, tone-units, graphemic strata, etc. [2.3.3. The Editorial Practices Declaration 16.3.2. Declarable Elements]
Module	header — Formal specification
Contained by	header: editorialDecl
May contain	core: p
Example	<editorialDecl> <segmentation> <p xml:lang="en">The texts are segmented into utterances (speeches) and segments (corresponding to paragraphs in the source transcription).</p> </segmentation> </editorialDecl>
Content model	<content> <elementRef key="p" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element segmentation { tei_p+ }⚓

Appendix A.1.91 <setting>

<setting> describes one particular setting in which a language interaction takes place. [16.2.3. The Setting Description]
Module	corpus — Formal specification
Contained by	corpus: settingDesc
May contain	core: date name
Note	If the who attribute is not supplied, the setting is assumed to be that of all participants in the language interaction.
Example	<setting> <name type="place">Commons Chamber</name> <name type="place">Westminster</name> <name type="city">London</name> <name type="country" key="GB">U.K.</name> <date when="2019-02-18">February 18th, 2019</date> </setting>
Content model	<content> <elementRef key="name" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="date"/> </content> ⚓
Schema Declaration	element setting { tei_name+, tei_date }⚓

Appendix A.1.92 <settingDesc>

<settingDesc> (setting description) describes the setting or settings within which a language interaction takes place, or other places otherwise referred to in a text, edition, or metadata. [16.2. Contextual Information 2.4. The Profile Description]
Module	corpus — Formal specification
Contained by	header: profileDesc
May contain	corpus: setting
Note	May contain a prose description organized as paragraphs, or a series of <setting> elements. If used to record not settings of language interactions, but other places mentioned in the text, then <place> optionally grouped by <listPlace> inside <standOff> should be preferred.
Example	<settingDesc> <setting> <name type="address">Trg sv. Marka 6</name> <name type="city">Zagreb</name> <name type="country" key="HR">Croatia</name> <date from="2016-11-15" to="2020-05-18">15.11.2016 - 18.5.2020</date> </setting> </settingDesc>
Content model	<content> <elementRef key="setting" minOccurs="1" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element settingDesc { tei_setting+ }⚓

Appendix A.1.93 <sex>

<sex> (sex) specifies the sex of a person. [14.3.2.1. Personal Characteristics]

Module

namesdates — Formal specification

Attributes

value

Status	Required
Legal values are:	M F U O N

Contained by

namesdates: person

May contain

Empty element

Note

As with other culturally-constructed traits such as age and gender, the way in which this concept is described in different cultural contexts varies. The normalizing attributes are provided only as an optional means of simplifying that variety for purposes of interoperability or project-internal taxonomies for consistency, and should not be used where that is inappropriate or unhelpful. The content of the element may be used to describe the intended concept in more detail.

Example

Content model

<content>
 <empty/>
</content>
    ⚓

Schema Declaration

element sex { attribute value { "M" | "F" | "U" | "O" | "N" }, empty }⚓

Appendix A.1.94 <sourceDesc>

<sourceDesc> (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. [2.2.7. The Source Description]
Module	header — Formal specification
Contained by	header: fileDesc
May contain	core: bibl spoken: recordingStmt
Example	The source description <sourceDesc> of the corpus root encodes the original digital source of the ParlaMint corpus: <sourceDesc> <bibl> <title type="main" xml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title> <title type="main" xml:lang="en">Minutes of the National Assembly of the Republic of Slovenia</title> <idno type="URI">https://www.dz-rs.si</idno> <date from="2014-08-01" to="2020-07-16">1.8.2014 - 16.7.2020</date> </bibl> </sourceDesc>
Example	For corpus components the source description is very similar to the one for the corpus root, except it reflects information of the exact meeting. Furthermore, if the audio or video of the meeting is available, this information can also be given: <sourceDesc> <bibl> <title type="main" xml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title> <title type="main" xml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title> <idno type="URI">https://www.psp.cz/eknih/2013ps/stenprot/044schuz/s044033.htm</idno> <date when="2016-04-13">13.04.2016</date> </bibl> <recordingStmt> <recording type="audio"> <media xml:id="ps2013-044-02-000-000.audio1" mimeType="audio/mp3" source="https://www.psp.cz/eknih/2013ps/audio/2016/04/13/2016041308580912.mp3" url="2013ps/audio/2016/04/13/2016041308580912.mp3"/> </recording> </recordingStmt> </sourceDesc>
Content model	<content> <elementRef key="bibl" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="recordingStmt" minOccurs="0" maxOccurs="1"/> </content> ⚓
Schema Declaration	element sourceDesc { tei_bibl+, tei_recordingStmt? }⚓

Appendix A.1.95 <state>

<state> (state) defines additional metadata on a political party or parliamentary group, e.g. its political orientation. [14.3.1. Basic Principles 14.3.2.1. Personal Characteristics]

Module

namesdates — Formal specification

Attributes

att.global
- xml:id
- xml:lang
- xml:base
- xml:space
- @n
att.global.source
- @source
att.datable.w3c
- when
- notBefore
- notAfter
- @from
- @to
att.canonical
- ref
- @key

ana

(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.

Derived from	att.global.analytic
Status	Optional
Datatype	teidata.pointer

type

Status	Required
Legal values are:	politicalOrientation encoder Wikipedia CHES variable value

Member of

model.placeStateLike

Contained by

namesdates: org state

May contain

core: note

namesdates: state

Note

Where there is confusion between <trait> and <state> the more general purpose element <state> should be used even for unchanging characteristics. If you wish to distinguish between characteristics that are generally perceived to be time-bound states and those assumed to be fixed traits, then <trait> is available for the more static of these. The <state> element encodes characteristics which are sometimes assumed to change, often at specific times or over a date range, whereas the <trait> elements are used to record characteristics, such as eye-colour, which are less subject to change. Traits are typically, but not necessarily, independent of the volition or action of the holder.

Example

Encoding political orientation as entered by an encoder:

<state type="politicalOrientation"> <state type="encoder" source="#GrietDepoorter" ana="#orientation.CRR"> <note xml:lang="en">Orientation determined by encoder, using own knowledge of the parliamentary group.</note> </state> </state>

Example

Encoding Wikipedia-sourced political orientation:

Example

Encoding CHES-sourced variables and their values, with @key containing the CHES name for the political party:

Content model

<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="note" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="state" minOccurs="0"
   maxOccurs="unbounded"/>
 </sequence>
</content>
    ⚓

Schema Declaration

element state
{
   tei_att.global.attribute.n,
   tei_att.global.source.attribute.source,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   tei_att.canonical.attribute.key,
   attribute ana { text }?,
   attribute type
   {
      "politicalOrientation"
    | "encoder"
    | "Wikipedia"
    | "CHES"
    | "variable"
    | "value"
   },
   ( tei_note*, tei_state* )
}⚓

Appendix A.1.96 <surname>

<surname> (surname) contains a family (inherited) name, as opposed to a given, baptismal, or nick name. [14.2.1. Personal Names]

Module

namesdates — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang

type

Status	Optional
Legal values are:	birth patronym married

Member of

model.persNamePart

Contained by

namesdates: persName

May contain

Character data only

Example

<persName> <surname>Accetto</surname> <forename>Matej</forename> </persName>

Example

<persName> <forename>Ірина</forename> <surname type="patronym">Борисівна</surname> <surname>Щеняєва</surname> </persName> <persName xml:lang="uk-Latn"> <forename>Iryna</forename> <surname type="patronym">Borysivna</surname> <surname>Ščenjajeva</surname> </persName>

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element surname
{
   tei_att.global.attribute.xmllang,
   attribute type { "birth" | "patronym" | "married" }?,
   text
}⚓

Appendix A.1.97 <tagUsage>

<tagUsage> (element usage) documents the usage of a specific element within a specified document. [2.3.4. The Tagging Declaration]

Module

header — Formal specification

Attributes

(generic identifier) specifies the name (generic identifier) of the element indicated by the tag, within the namespace indicated by the parent <namespace> element. All descendats of <text> element and <text> element counts have to be included.

Status	Required
Datatype	teidata.name

occurs

specifies the number of occurrences of this element within the text.

Status	Required
Datatype	teidata.count

Contained by

header: namespace

May contain

Empty element

Example

Content model

<content>
 <empty/>
</content>
    ⚓

Schema Declaration

element tagUsage { attribute gi { text }, attribute occurs { text }, empty }⚓

Appendix A.1.98 <tagsDecl>

<tagsDecl> (tagging declaration) provides detailed information about the tagging applied to a document. [2.3.4. The Tagging Declaration 2.3. The Encoding Description]
Module	header — Formal specification
Contained by	header: encodingDesc
May contain	header: namespace
Example	The tags declaration, <tagsDecl> of the corpus root gives the count of all the XML tags used in the data part (so, not in the TEI header) of the corpus (for the corpus root) or in an individual component of the corpus. <encodingDesc> ... <tagsDecl> <namespace name="http://www.tei-c.org/ns/1.0"> <tagUsage gi="text" occurs="414"/> <tagUsage gi="body" occurs="414"/> <tagUsage gi="div" occurs="414"/> ... </namespace> </tagsDecl> </encodingDesc>
Content model	<content> <elementRef key="namespace"/> </content> ⚓
Schema Declaration	element tagsDecl { tei_namespace }⚓

Appendix A.1.99 <taxonomy>

<taxonomy> (taxonomy) defines a typology explicitly by a structured taxonomy. [2.3.7. The Classification Declaration]

Module

header — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang

xml:id

(identifier) provides a unique identifier for the element bearing the attribute.

Derived from	att.global
Status	Required
Datatype	ID

Contained by

header: classDecl

May contain

core: desc

header: category

Note

Nested taxonomies are common in many fields, so the <taxonomy> element can be nested.

Example

<taxonomy xml:id="subcorpus"> <desc xml:lang="sl"> <term>Podkorpusi</term> </desc> <desc xml:lang="en"> <term>Subcorpora</term> </desc> <category xml:id="reference"> <catDesc xml:lang="sl"> <term>Referenca</term>: referenčni podkorpus, do 2020-01-30</catDesc> <catDesc xml:lang="en"> <term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc> </category> <category xml:id="covid"> <catDesc xml:lang="sl"> <term>COVID</term>: COVID podkorpus, od 2020-01-31 dalje</catDesc> <catDesc xml:lang="en"> <term>COVID</term>: COVID subcorpus, from 2020-01-31 onwards</catDesc> </category> </taxonomy>

Example

<taxonomy xml:id="parla.legislature"> <desc xml:lang="it"> <term>Legislatura</term> </desc> <desc xml:lang="en"> <term>Legislature</term> </desc> <category xml:id="parla.geo-political"> <catDesc xml:lang="it"> <term>Unità geo-politica o amministrativa</term> </catDesc> <catDesc xml:lang="en"> <term>Geo-political or administrative units</term> </catDesc> <category xml:id="parla.supranational"> <catDesc xml:lang="it"> <term>Legislatura sovranazionale</term> </catDesc> <catDesc xml:lang="en"> <term>Supranational legislature</term> </catDesc> </category> <category xml:id="parla.national"> <catDesc xml:lang="it"> <term>Legislatura nazionale</term> </catDesc> <catDesc xml:lang="en"> <term>National legislature</term> </catDesc> </category> ... </category> </taxonomy> ... <org ana="#parla.national #parla.upper" role="parliament" xml:id="LEG"> <orgName full="yes" xml:lang="it">Senato della Repubblica Italiana</orgName> <orgName full="yes" xml:lang="it">Senate of the Republic of Italy</orgName> </org>

Content model

<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="category" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element taxonomy
{
   tei_att.global.attribute.xmllang,
   attribute xml:id { text },
   tei_desc+,
   tei_category+
}⚓

Appendix A.1.100 <teiCorpus>

<teiCorpus> (TEI corpus) contains one whole corpus, stored in the corpus root file comprising the corpus header and XInclude references to corpus component files, each containing a <TEI> element. [4. Default Text Structure 16.1. Varieties of Composite Text]

Module

core — Formal specification

Attributes

att.global.linking
- synch
- next
- prev
- @corresp

xml:id

Status	Required
Datatype	ID

xml:lang

Status	Required
Datatype	teidata.language

Contained by

—

May contain

derived-module-parlamint: include

header: teiHeader

textstructure: TEI

Note

Should contain one <teiHeader> for the corpus, and a series of <TEI> elements, one for each text.

As with all elements in the TEI scheme (except <egXML>) this element is in the TEI namespace (see 5.7.2. Namespaces). Thus, when it is used as the outermost element of a TEI document, it is necessary to specify the TEI namespace on it. This is customarily achieved by including http://www.tei-c.org/ns/1.0 as the value of the XML namespace declaration (xmlns), without indicating a prefix, and then not using a prefix on TEI elements in the rest of the document. For example: <teiCorpus version="4.8.1" xml:lang="en" xmlns="http://www.tei-c.org/ns/1.0">.

Example

General structure of a ParlaMint corpus root:

Content model

<content>
 <elementRef key="teiHeader"/>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="TEI"/>
  <elementRef key="include"/>
 </alternate>
</content>
    ⚓

Schema Declaration

element teiCorpus
{
   tei_att.global.linking.attribute.corresp,
   attribute xml:id { text },
   attribute xml:lang { text },
   tei_teiHeader,
   ( tei_TEI | tei_include )+
}⚓

Appendix A.1.101 <teiHeader>

<teiHeader> (TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources. [2.1.1. The TEI Header and Its Components 16.1. Varieties of Composite Text]
Module	header — Formal specification
Attributes	att.global xml:id n xml:base xml:space @xml:lang
Contained by	core: teiCorpus textstructure: TEI
May contain	header: encodingDesc fileDesc profileDesc revisionDesc
Note	One of the few elements unconditionally required in any TEI document.
Example	Basic structure of the <teiHeader>: <teiHeader> <fileDesc>...</fileDesc> <encodingDesc>...</encodingDesc> <profileDesc>...</profileDesc> <revisionDesc>...</revisionDesc> </teiHeader>
Example	Example of a ParlaMint corpus component <teiHeader>: <teiHeader> <fileDesc> <titleStmt> <title type="main" xml:lang="lv">Latvijas parlamenta corpus ParlaMint-LV, 12. Saeima, 2014-11-04 [ParlaMint]</title> <title type="main" xml:lang="en">Latvian parliamentary corpus ParlaMint-LV, 12th Term, 2014-11-04 [ParlaMint]</title> <meeting corresp="#PT" ana="#parla.meeting.regular">Regulārā</meeting> <meeting n="13" corresp="#PT" ana="#parla.term #PT.13">13. sasaukums</meeting> </titleStmt> <editionStmt> <edition>2.1</edition> </editionStmt> <extent> <measure unit="speeches" quantity="257" xml:lang="en">257 speeches</measure> <measure unit="words" quantity="11847" xml:lang="en">11,847 words</measure> <measure unit="tokens" quantity="14628" xml:lang="en">14628 tokens</measure> </extent> <publicationStmt> <publisher> <orgName xml:lang="en">CLARIN research infrastructure</orgName> <ref target="https://www.clarin.eu/">www.clarin.eu</ref> </publisher> <idno subtype="handle" type="URI">http://hdl.handle.net/11356/1432</idno> <availability status="free"> <licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p> </availability> <date when="2021-06-10">June 10, 2021</date> </publicationStmt> <sourceDesc> <bibl> <title type="main" xml:lang="lv">Saeimas sēžu stenogrammas</title> <idno type="URI">https://www.saeima.lv/lv/transcripts/view/264</idno> </bibl> </sourceDesc> </fileDesc> <encodingDesc> <projectDesc> <p xml:lang="en"> <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> </p> </projectDesc> <tagsDecl> <namespace name="http://www.tei-c.org/ns/1.0"> <tagUsage gi="text" occurs="1"/> <tagUsage gi="body" occurs="1"/> <tagUsage gi="div" occurs="1"/> <tagUsage gi="head" occurs="2"/> <tagUsage gi="note" occurs="257"/> <tagUsage gi="u" occurs="257"/> <tagUsage gi="seg" occurs="647"/> </namespace> </tagsDecl> </encodingDesc> <profileDesc> <settingDesc> <setting> <name type="city">Rīga</name> <name type="country" key="LV">Latvija</name> <date when="2014-11-04" ana="#parla.sitting">2014-11-04</date> </setting> </settingDesc> </profileDesc> </teiHeader>
Content model	<content> <elementRef key="fileDesc"/> <elementRef key="encodingDesc"/> <elementRef key="profileDesc"/> <elementRef key="revisionDesc" minOccurs="0" maxOccurs="1"/> </content> ⚓
Schema Declaration	element teiHeader { tei_att.global.attribute.xmllang, tei_fileDesc, tei_encodingDesc, tei_profileDesc, tei_revisionDesc? }⚓

Appendix A.1.102 <term>

<term> (term) contains a single-word, multi-word, or symbolic designation which is regarded as a technical term. [3.4.1. Terms and Glosses]
Module	core — Formal specification
Member of	model.emphLike
Contained by	core: desc header: catDesc namesdates: persName
May contain	Character data only
Note	When this element appears within an <index> element, it is understood to supply the form under which an index entry is to be made for that location. Elsewhere, it is understood simply to indicate that its content is to be regarded as a technical or specialised term. It may be associated with a <gloss> element by means of its ref attribute; alternatively a <gloss> element may point to a <term> element by means of its target attribute. In formal terminological work, there is frequently discussion over whether terms must be atomic or may include multi-word lexical items, symbolic designations, or phraseological units. The <term> element may be used to mark any of these. No position is taken on the philosophical issue of what a term can be; the looser definition simply allows the <term> element to be used by practitioners of any persuasion. As with other members of the att.canonical class, instances of this element occuring in a text may be associated with a canonical definition, either by means of a URI (using the ref attribute), or by means of some system-specific code value (using the key attribute). Because the mutually exclusive target and cRef attributes overlap with the function of the ref attribute, they are deprecated and may be removed at a subsequent release.
Example	<term> is used inside taxonomies to name the taxonomy and its categories: <taxonomy xml:id="subcorpus"> <desc xml:lang="sl"> <term>Podkorpusi</term> </desc> <desc xml:lang="en"> <term>Subcorpora</term> </desc> <category xml:id="reference"> <catDesc xml:lang="sl"> <term>Referenca</term>: referenčni podkorpus, do 2020-10-30</catDesc> <catDesc xml:lang="en"> <term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc> </category> ... </taxonomy>
Example	<catDesc xml:lang="en"> <term>acl</term>: Clausal modifier of noun (adjectival clause) </catDesc> <catDesc xml:lang="en"> <term>dep</term>: Unspecified dependency </catDesc> <catDesc xml:lang="en"> <term>punct</term>: Punctuation </catDesc>
Content model	<content> <textNode/> </content> ⚓
Schema Declaration	element term { text }⚓

Appendix A.1.103 <text>

<text> (text) contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 16.1. Varieties of Composite Text]
Module	textstructure — Formal specification
Attributes	att.global xml:base xml:space @xml:id @n @xml:lang att.global.analytic @ana att.global.source @source
Contained by	textstructure: TEI
May contain	textstructure: body
Note	This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose.
Example	<text ana="#reference"> <body> <div type="debateSection">...</div> <div type="debateSection">...</div> ... </body> </text>
Content model	<content> <elementRef key="body"/> </content> ⚓
Schema Declaration	element text { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.global.source.attribute.source, tei_body }⚓

Appendix A.1.104 <textClass>

<textClass> (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc. [2.4.3. The Text Classification]
Module	header — Formal specification
Contained by	header: profileDesc
May contain	header: catRef
Example	<textClass> <catRef scheme="#parla.legislature" target="#parla.bi #parla.lower #parla.upper"/> </textClass>
Content model	<content> <elementRef key="catRef"/> </content> ⚓
Schema Declaration	element textClass { tei_catRef }⚓

Appendix A.1.105 <time>

<time> (time) contains a phrase defining a time of day in any format. [3.6.4. Dates and Times]
Module	core — Formal specification
Attributes	att.typed @type @subtype att.global n xml:base xml:space @xml:id @xml:lang att.global.analytic @ana att.datable.w3c notBefore notAfter @when @from @to
Member of	model.dateLike
Contained by	analysis: s core: name note unit
May contain	analysis: pc w character data
Example	A note giving the time when e.g. the session started: <note type="time"> <time when="2016-04-13T09:10:00">(9.10 hodin)</time> </note>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <textNode/> </alternate> </content> ⚓
Schema Declaration	element time { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.datable.w3c.attribute.when, tei_att.datable.w3c.attribute.from, tei_att.datable.w3c.attribute.to, tei_att.typed.attributes, ( tei_w \| tei_pc \| text )+ }⚓

Appendix A.1.106 <title>

<title> (title) contains a title for any kind of work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.5. The Series Statement]

Module

core — Formal specification

Attributes

att.global
- xml:id
- n
- xml:base
- xml:space
- @xml:lang

type

Status	Recommended
Legal values are:	main sub
Note	Attribute is required in <titleStmt> context.

Member of

model.emphLike

Contained by

core: bibl

header: titleStmt

May contain

Character data only

Note

The attributes key and ref, inherited from the class att.canonical may be used to indicate the canonical form for the title; the former, by supplying (for example) the identifier of a record in some external library system; the latter by pointing to an XML element somewhere containing the canonical form of the title.

Example

The <title> element as used in the <titleStmt> of the corpus root <teiHeader>:

<title type="main" xml:lang="cs">Český parlamentní korpus ParlaMint-CZ [ParlaMint]</title> <title type="main" xml:lang="en">Czech parliamentary corpus ParlaMint-CZ [ParlaMint]</title> <title type="sub" xml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title> <title type="sub" xml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title>

Example

The <title> element as used in the <titleStmt> of the corpus component <teiHeader>:

<title type="main" xml:lang="cs">Český parlamentní korpus ParlaMint-CZ, 2013-11-25 ps2013-001-01-000-000 [ParlaMint]</title> <title type="main" xml:lang="en">Czech parliamentary corpus ParlaMint-CZ, 2013-11-25 ps2013-001-01-000-000 [ParlaMint]</title> <title type="sub" xml:lang="cs">Parlament České republiky, Poslanecká sněmovna, 2013-11-25, Začátek schůze Poslanecké sněmovny 25. listopadu 2013 ve 14.05 hodin Přítomno: 199 poslanců</title> <title type="sub" xml:lang="en">Parliament of the Czech Republic, Chamber of Deputies, 2013-11-25</title>

Content model

<content>
 <textNode/>
</content>
    ⚓

Schema Declaration

element title
{
   tei_att.global.attribute.xmllang,
   attribute type { "main" | "sub" }?,
   text
}⚓

Appendix A.1.107 <titleStmt>

<titleStmt> (title statement) groups information about the title of a work and those responsible for its content. [2.2.1. The Title Statement 2.2. The File Description]
Module	header — Formal specification
Contained by	header: fileDesc
May contain	core: meeting respStmt title header: funder
Example	The <titleStmt> element gives the title of the corpus root or component, along with the specification of the particular session(s) of the parliament contained, the persons responsible for compiling the corpus and the funder(s) of the project: <titleStmt> <title type="main">Slovenski parlamentarni korpus ParlaMint-SI [ParlaMint]</title> <title type="main" xml:lang="en">Slovenian parliamentary corpus ParlaMint-SI [ParlaMint]</title> <title type="sub">Zapisi sej Državnega zbora Republike Slovenije, 7. in 8. mandat (2014 - 2020)</title> <title type="sub" xml:lang="en">Minutes of the National Assembly of the Republic of Slovenia, Term 7 and 8 (2014 - 2020)</title> <meeting n="7" corresp="#DZ" ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting> <meeting n="8" corresp="#DZ" ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting> <respStmt> <persName ref="https://orcid.org/0000-0001-6143-6877">Andrej Pančur</persName> <persName ref="https://orcid.org/0000-0002-1560-4099">Tomaž Erjavec</persName> <resp>Kodiranje ParlaMint TEI XML</resp> <resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp> </respStmt> <funder> <orgName>Raziskovalna infrastruktura CLARIN</orgName> <orgName xml:lang="en">The CLARIN research infrastructure</orgName> </funder> <funder> <orgName>Slovenska raziskovalna infrastruktura CLARIN.SI</orgName> <orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName> </funder> </titleStmt>
Content model	<content> <elementRef key="title" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="meeting" minOccurs="1" maxOccurs="unbounded"/> <elementRef key="respStmt" minOccurs="0" maxOccurs="unbounded"/> <elementRef key="funder" minOccurs="0" maxOccurs="unbounded"/> </content> ⚓
Schema Declaration	element titleStmt { tei_title+, tei_meeting+, tei_respStmt, tei_funder }⚓

Appendix A.1.108 <u>

<u> (utterance) contains a stretch of speech usually preceded and followed by silence or by a change of speaker. [8.3.1. Utterances]
Module	spoken — Formal specification
Attributes	att.global xml:base xml:space @xml:id @n @xml:lang att.global.analytic @ana att.global.linking synch @corresp @next @prev att.global.source @source att.ascribed @who
Member of	model.divPart.spoken
Contained by	textstructure: div
May contain	core: gap measure note pb linking: seg spoken: incident kinesic vocal
Note	Prose and a mixture of speech elements Although individual transcriptions may consistently use <u> elements for turns or other units, and although in most cases a <u> will be delimited by pause or change of speaker, <u> is not required to represent a turn or any communicative event, nor to be bounded by pauses or change of speaker. At a minimum, a <u> is some phonetic production by a given speaker.
Example	The element <u> marks up a speech, as illustrated below: <u who="#DavidPrior" ana="#regular"> <seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg> <seg>The relevant document is the 20th Report from the Legislation Committee.</seg> </u>
Content model	<content> <elementRef key="measure" minOccurs="0"/> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="note"/> <elementRef key="vocal"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="gap"/> <elementRef key="pb"/> <elementRef key="seg"/> </alternate> </content> ⚓
Schema Declaration	element u { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.global.linking.attribute.corresp, tei_att.global.linking.attribute.next, tei_att.global.linking.attribute.prev, tei_att.global.source.attribute.source, tei_att.ascribed.attribute.who, tei_measure?, ( tei_note \| tei_vocal \| tei_kinesic \| tei_incident \| tei_gap \| tei_pb \| tei_seg )+ }⚓

Appendix A.1.109 <unit>

<unit> contains a symbol, a word or a phrase referring to a unit of measurement in any kind of formal or informal system. [3.6.3. Numbers and Measures]
Module	core — Formal specification
Attributes	att.global xml:base xml:space @xml:id @n @xml:lang att.global.analytic @ana
Member of	model.measureLike
Contained by	core: unit
May contain	analysis: pc w core: date email gap name note num ref time unit spoken: incident kinesic vocal
Example	The element can be used for fine-grained Named Entities which include units: <num ana="ne:nc" xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.ne53"> <w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w9" lemma="3" msd="UPosTag=NUM\|NumForm=Digit\|NumType=Card">3</w> <w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w10" lemma="miliarda" msd="UPosTag=NOUN\|Case=Gen\|Gender=Fem\|Number=Sing\|Polarity=Pos">miliardy</w> </num> <unit ana="ne:om" xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.ne54"> <w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w11" lemma="Kč" msd="UPosTag=NOUN\|Gender=Fem\|Polarity=Pos" join="right">Kč</w> </unit>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <elementRef key="w"/> <elementRef key="pc"/> <elementRef key="name"/> <elementRef key="date"/> <elementRef key="time"/> <elementRef key="num"/> <elementRef key="unit"/> <elementRef key="email"/> <elementRef key="ref"/> <elementRef key="note"/> <elementRef key="gap"/> <elementRef key="kinesic"/> <elementRef key="incident"/> <elementRef key="vocal"/> </alternate> </content> ⚓
Schema Declaration	element unit { tei_att.global.attribute.xmlid, tei_att.global.attribute.n, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, ( tei_w \| tei_pc \| tei_name \| tei_date \| tei_time \| tei_num \| tei_unit \| tei_email \| tei_ref \| tei_note \| tei_gap \| tei_kinesic \| tei_incident \| tei_vocal )+ }⚓

Appendix A.1.110 <vocal>

<vocal> (vocal) marks any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc. [8.3.3. Vocal, Kinesic, Incident]

Module

spoken — Formal specification

Attributes

att.global
- xml:base
- xml:space
- @xml:id
- @n
- @xml:lang
att.global.linking
- synch
- next
- prev
- @corresp
att.ascribed
- @who
att.typed
- type
- @subtype

type

Status	Recommended
Legal values are:	greeting question clarification speaking interruption exclamat laughter shouting murmuring noise signal

Member of

model.global.spoken

Contained by

analysis: s

core: name unit

linking: seg

spoken: u

textstructure: div

May contain

core: desc

Example

<vocal type="interruption"> <desc>Interruption from the chair: Your time is up.</desc> </vocal>

Content model

<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    ⚓

Schema Declaration

element vocal
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.ascribed.attribute.who,
   tei_att.typed.attribute.subtype,
   attribute type
   {
      "greeting"
    | "question"
    | "clarification"
    | "speaking"
    | "interruption"
    | "exclamat"
    | "laughter"
    | "shouting"
    | "murmuring"
    | "noise"
    | "signal"
   }?,
   tei_desc+
}⚓

Appendix A.1.111 <w>

<w> (word) represents a grammatical (not necessarily orthographic) word. [18.1. Linguistic Segment Categories 18.4.2. Lightweight Linguistic Annotation]
Module	analysis — Formal specification
Attributes	att.linguistic @lemma @pos @msd @join att.lexicographic.normalized @norm att.global n xml:base xml:space @xml:id @xml:lang att.global.analytic @ana att.segLike @function
Member of	model.segLike
Contained by	analysis: phr s w core: date email name num time unit
May contain	analysis: w character data
Example	<s xml:id="ParlaMint-GB_2017-10-30-lords.seg4.1"> <w lemma="I" msd="UPosTag=PRON\|Case=Nom\|Number=Sing\|Person=1\|PronType=Prs" pos="PRP">I</w> <w lemma="support" msd="UPosTag=VERB\|Mood=Ind\|Tense=Pres\|VerbForm=Fin" pos="VBP">support</w> <w lemma="the" msd="UPosTag=DET\|Definite=Def\|PronType=Art" pos="DT">the</w> <w lemma="amendment" msd="UPosTag=NOUN\|Number=Sing" pos="NN" join="right">amendment</w> <pc msd="UPosTag=PUNCT" pos=".">.</pc> </s>
Example	Certain frameworks, in particular the Universal Dependencies, allow for tokens to be decomposed into several words, and it is these syntactic words, and not tokens, that are further annotated. For example, Czech has the word ‘abyste’ which is in UD decomposed into two syntactic words, ‘aby’ and ‘byste’, which can be encoded in the <w> element: <w>abyste <w norm="aby" lemma="aby" msd="UPosTag=SCONJ"/> <w norm="byste" lemma="být" msd="UPosTag=AUX\|Mood=Cnd\|Number=Plur\|Person=2\|VerbForm=Fin"/> </w>
Content model	<content> <alternate minOccurs="1" maxOccurs="unbounded"> <textNode/> <elementRef key="w"/> </alternate> </content> ⚓
Schema Declaration	element w { tei_att.global.attribute.xmlid, tei_att.global.attribute.xmllang, tei_att.global.analytic.attribute.ana, tei_att.linguistic.attributes, tei_att.segLike.attribute.function, ( text \| tei_w )+ }⚓

Appendix A.2 Model classes

Appendix A.2.1 model.addressLike

model.addressLike groups elements used to represent a postal or email address. [1. The TEI Infrastructure]
Module	tei — Formal specification
Used by	model.pPart.data
Members	affiliation email

Appendix A.2.2 model.attributable

model.attributable groups elements that contain a word or phrase that can be attributed to a source. [3.3.3. Quotation 4.3.2. Floating Texts]
Module	tei — Formal specification
Used by	model.inter
Members	model.quoteLike

Appendix A.2.3 model.biblLike

model.biblLike groups elements containing a bibliographic description. [3.12. Bibliographic Citations and References]
Module	tei — Formal specification
Used by	model.inter
Members	bibl

Appendix A.2.4 model.dateLike

model.dateLike groups elements containing temporal expressions. [3.6.4. Dates and Times 14.4. Dates]
Module	tei — Formal specification
Used by	model.pPart.data
Members	date time

Appendix A.2.5 model.divPart

model.divPart groups paragraph-level elements appearing directly within divisions. [1.3. The TEI Class System]
Module	tei — Formal specification
Used by
Members	model.divPart.spoken[u] model.lLike model.pLike[p]
Note	Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items.

Appendix A.2.6 model.divPart.spoken

model.divPart.spoken groups elements structurally analogous to paragraphs within spoken texts. [8.1. General Considerations and Overview]
Module	spoken — Formal specification
Used by	model.divPart
Members	u
Note	Spoken texts may be structured in many ways; elements in this class are typically larger units such as turns or utterances.

Appendix A.2.7 model.emphLike

model.emphLike groups phrase-level elements which are typographically distinct and to which a specific function can be attributed. [3.3. Highlighting and Quotation]
Module	tei — Formal specification
Used by	model.highlighted model.limitedPhrase
Members	term title

Appendix A.2.8 model.global

model.global groups elements which may appear at any point within a TEI text. [1.3. The TEI Class System]
Module	tei — Formal specification
Used by	model.paraPart
Members	model.global.edit[gap] model.global.meta[link linkGrp] model.global.spoken[incident kinesic vocal] model.milestoneLike[pb] model.noteLike[note] figure

Appendix A.2.9 model.global.edit

model.global.edit groups globally available elements which perform a specifically editorial function. [1.3. The TEI Class System]
Module	tei — Formal specification
Used by	model.global
Members	gap

Appendix A.2.10 model.global.meta

model.global.meta groups globally available elements which describe the status of other elements. [1.3. The TEI Class System]
Module	tei — Formal specification
Used by	model.global
Members	link linkGrp
Note	Elements in this class are typically used to hold groups of links or of abstract interpretations, or by provide indications of certainty etc. It may find be convenient to localize all metadata elements, for example to contain them within the same divison as the elements that they relate to; or to locate them all to a division of their own. They may however appear at any point in a TEI text.

Appendix A.2.11 model.global.spoken

model.global.spoken groups elements which may appear globally within spoken texts. [8.1. General Considerations and Overview]
Module	spoken — Formal specification
Used by	model.global
Members	incident kinesic vocal
Note	This class groups elements which can appear anywhere within transcribed speech.

Appendix A.2.12 model.graphicLike

model.graphicLike groups elements containing images, formulae, and similar objects. [3.10. Graphics and Other Non-textual Components]
Module	tei — Formal specification
Used by	model.phrase
Members	graphic media

Appendix A.2.13 model.highlighted

model.highlighted groups phrase-level elements which are typographically distinct. [3.3. Highlighting and Quotation]
Module	tei — Formal specification
Used by	model.phrase
Members	model.emphLike[term title] model.hiLike

Appendix A.2.14 model.inter

model.inter groups elements which can appear either within or between paragraph-like elements. [1.3. The TEI Class System]
Module	tei — Formal specification
Used by	model.paraPart
Members	model.attributable[model.quoteLike] model.biblLike[bibl] model.egLike model.labelLike[desc label] model.listLike[listEvent listOrg listPerson listRelation] model.oddDecl model.stageLike

Appendix A.2.15 model.labelLike

model.labelLike groups elements used to gloss or explain other parts of a document.
Module	tei — Formal specification
Used by	model.inter
Members	desc label

Appendix A.2.16 model.limitedPhrase

model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources. [1.3. The TEI Class System]
Module	tei — Formal specification
Used by
Members	model.emphLike[term title] model.hiLike model.pPart.data[model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]] model.pPart.editorial model.pPart.msdesc model.phrase.xml model.ptrLike[ref]

Appendix A.2.17 model.listLike

model.listLike groups list-like elements. [3.8. Lists]
Module	tei — Formal specification
Used by	model.inter
Members	listEvent listOrg listPerson listRelation

Appendix A.2.18 model.measureLike

model.measureLike groups elements which denote a number, a quantity, a measurement, or similar piece of text that conveys some numerical meaning. [3.6.3. Numbers and Measures]
Module	tei — Formal specification
Used by	model.pPart.data
Members	measure num unit

Appendix A.2.19 model.milestoneLike

model.milestoneLike groups milestone-style elements used to represent reference systems. [1.3. The TEI Class System 3.11.3. Milestone Elements]
Module	tei — Formal specification
Used by	model.global
Members	pb

Appendix A.2.20 model.nameLike

model.nameLike groups elements which name or refer to a person, place, or organization.
Module	tei — Formal specification
Used by	model.pPart.data
Members	model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno
Note	A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc.

Appendix A.2.21 model.nameLike.agent

model.nameLike.agent groups elements which contain names of individuals or corporate bodies. [3.6. Names, Numbers, Dates, Abbreviations, and Addresses]
Module	tei — Formal specification
Used by	model.nameLike
Members	name orgName persName
Note	This class is used in the content model of elements which reference names of people or organizations.

Appendix A.2.22 model.noteLike

model.noteLike groups globally-available note-like elements. [3.9. Notes, Annotation, and Indexing]
Module	tei — Formal specification
Used by	model.global
Members	note

Appendix A.2.23 model.pLike

model.pLike groups paragraph-like elements.
Module	tei — Formal specification
Used by	equipment model.divPart
Members	p

Appendix A.2.24 model.pPart.data

model.pPart.data groups phrase-level elements containing names, dates, numbers, measures, and similar data. [3.6. Names, Numbers, Dates, Abbreviations, and Addresses]
Module	tei — Formal specification
Used by	model.limitedPhrase model.phrase
Members	model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]

Appendix A.2.25 model.pPart.edit

model.pPart.edit groups phrase-level elements for simple editorial correction and transcription. [3.5. Simple Editorial Changes]
Module	tei — Formal specification
Used by	model.phrase
Members	model.pPart.editorial model.pPart.transcriptional

Appendix A.2.26 model.paraPart

model.paraPart groups elements that may appear in paragraphs and similar elements. [3.1. Paragraphs]
Module	tei — Formal specification
Used by
Members	model.gLike model.global[model.global.edit[gap] model.global.meta[link linkGrp] model.global.spoken[incident kinesic vocal] model.milestoneLike[pb] model.noteLike[note] figure] model.inter[model.attributable[model.quoteLike] model.biblLike[bibl] model.egLike model.labelLike[desc label] model.listLike[listEvent listOrg listPerson listRelation] model.oddDecl model.stageLike] model.lLike model.phrase[model.graphicLike[graphic media] model.highlighted[model.emphLike[term title] model.hiLike] model.lPart model.pPart.data[model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]] model.pPart.edit[model.pPart.editorial model.pPart.transcriptional] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] model.segLike[pc phr s seg w] model.specDescLike]

Appendix A.2.27 model.persNamePart

model.persNamePart groups elements which form part of a personal name. [14.2.1. Personal Names]
Module	namesdates — Formal specification
Used by	model.nameLike
Members	addName forename nameLink roleName surname

Appendix A.2.28 model.phrase

model.phrase groups elements which can occur at the level of individual words or phrases. [1.3. The TEI Class System]
Module	tei — Formal specification
Used by	model.paraPart
Members	model.graphicLike[graphic media] model.highlighted[model.emphLike[term title] model.hiLike] model.lPart model.pPart.data[model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]] model.pPart.edit[model.pPart.editorial model.pPart.transcriptional] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] model.segLike[pc phr s seg w] model.specDescLike
Note	This class of elements can occur within paragraphs, list items, lines of verse, etc.

Appendix A.2.29 model.placeNamePart

model.placeNamePart groups elements which form part of a place name. [14.2.3. Place Names]
Module	tei — Formal specification
Used by	model.placeStateLike
Members	placeName

Appendix A.2.30 model.placeStateLike

model.placeStateLike groups elements which describe changing states of a place.
Module	tei — Formal specification
Used by	model.nameLike
Members	model.placeNamePart[placeName] state

Appendix A.2.31 model.ptrLike

model.ptrLike groups elements used for purposes of location and reference. [3.7. Simple Links and Cross-References]
Module	tei — Formal specification
Used by	model.limitedPhrase model.phrase
Members	ref

Appendix A.2.32 model.segLike

model.segLike groups elements used for arbitrary segmentation. [17.3. Blocks, Segments, and Anchors 18.1. Linguistic Segment Categories]
Module	tei — Formal specification
Used by	model.phrase
Members	pc phr s seg w
Note	The principles on which segmentation is carried out, and any special codes or attribute values used, should be defined explicitly in the <segmentation> element of the <encodingDesc> within the associated TEI header.

Appendix A.3 Attribute classes

Appendix A.3.1 att.ascribed

att.ascribed provides attributes for elements representing speech or action that can be ascribed to a specific individual. [3.3.3. Quotation 8.3. Elements Unique to Spoken Texts]

Module

tei — Formal specification

Members

att.ascribed.directed[kinesic u vocal] change incident setting

Attributes

who

indicates the person, or group of people, to whom the element content is ascribed.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
In the following example from Hamlet, speeches (<sp>) in the body of the play are linked to <role> elements in the <castList> using the who attribute. <castItem type="role"> <role xml:id="Barnardo">Bernardo</role> </castItem> <castItem type="role"> <role xml:id="Francisco">Francisco</role> <roleDesc>a soldier</roleDesc> </castItem> <!-- ... --> <sp who="#Barnardo"> <speaker>Bernardo</speaker> <l n="1">Who's there?</l> </sp> <sp who="#Francisco"> <speaker>Francisco</speaker> <l n="2">Nay, answer me: stand, and unfold yourself.</l> </sp>
Note	For transcribed speech, this will typically identify a participant or participant group; in other contexts, it will point to any identified <person> element.

Appendix A.3.2 att.canonical

att.canonical provides attributes that can be used to associate a representation such as a name or title with canonical information about the object being named or referenced. [14.1.1. Linking Names and Their Referents]

Module tei — Formal specification

Members att.naming[att.personal[addName forename name orgName persName placeName roleName surname] affiliation birth death education event occupation pubPlace state] bibl catDesc date funder meeting publisher relation resp respStmt term time title

Attributes

key

provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.

Status	Optional
Datatype	teidata.text
<author> <name key="Hugo, Victor (1802-1885)" ref="http://www.idref.fr/026927608">Victor Hugo</name> </author>
Note	The value may be a unique identifier from a database, or any other externally-defined string identifying the referent. No particular syntax is proposed for the values of the key attribute, since its form will depend entirely on practice within a given project.

ref

(reference) provides an explicit means of locating a full definition or identity for the entity being named by means of one or more URIs.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
<name ref="http://viaf.org/viaf/109557338" type="person">Seamus Heaney</name>
Note	The value must point directly to one or more XML elements or other resources by means of one or more URIs, separated by whitespace. If more than one is supplied the implication is that the name identifies several distinct entities.

Example

In this contrived example, a canonical reference to the same organisation is provided in four different ways.

<author n="1"> <name ref="http://nzetc.victoria.ac.nz/tm/scholarly/name-427308.html" type="organisation">New Zealand Parliament, Legislative Council</name> </author> <author n="2"> <name ref="nzvn:427308" type="organisation">New Zealand Parliament, Legislative Council</name> </author> <author n="3"> <name ref="./named_entities.xml#o427308" type="organisation">New Zealand Parliament, Legislative Council</name> </author> <author n="4"> <name key="name-427308" type="organisation">New Zealand Parliament, Legislative Council</name> </author>

The first presumes the availability of an internet connection and a processor that can resolve a URI (most can). The second requires, in addition, a <prefixDef> that declares how the nzvm prefix should be interpreted. The third does not require an internet connection, but does require that a file named named_entities.xml be in the same directory as the TEI document. The fourth requires that an entire external system for key resolution be available.

Note

The key attribute is more flexible and general-purpose, but its use in interchange requires that documentation about how the key is to be resolved be sent to the recipient of the TEI document. In contrast values of the ref attribute are resolved using the widely accepted protocols for a URI, and thus less documentation, if any, is likely required by the recipient in data interchange.

These guidelines provide no semantic basis or suggested precedence when both key and ref are provided. For this reason simultaneous use of both is not recommended unless documentation explaining the use is provided, probably in an ODD customizaiton, for interchange.

Appendix A.3.3 att.datable.custom

att.datable.custom provides attributes for normalization of elements that contain datable events to a custom dating system (i.e. other than the Gregorian used by W3 and ISO). [14.4. Dates]

Module

namesdates — Formal specification

Members

att.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title]

Attributes

when-custom

supplies the value of a date or time in some custom standard form.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace
The following are examples of custom date or time formats that are not valid ISO or W3C format normalizations, normalized to a different dating system <p>Alhazen died in Cairo on the <date when="1040-03-06" when-custom="431-06-12"> 12th day of Jumada t-Tania, 430 AH </date>.</p> <p>The current world will end at the <date when="2012-12-21" when-custom="13.0.0.0.0">end of B'ak'tun 13</date>.</p> <p>The Battle of Meggidu (<date when-custom="Thutmose_III:23">23rd year of reign of Thutmose III</date>).</p> <p>Esidorus bixit in pace annos LXX plus minus sub <date when-custom="Ind:4-10-11">die XI mensis Octobris indictione IIII</date> </p> Not all custom date formulations will have Gregorian equivalents.The when-custom attribute and other custom dating are not constrained to a datatype by the TEI, but individual projects are recommended to regularize and document their dating formats.

notBefore-custom

specifies the earliest possible date for the event in some custom standard form.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace

notAfter-custom

specifies the latest possible date for the event in some custom standard form.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace

from-custom

indicates the starting point of the period in some custom standard form.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace
<event xml:id="FIRE1" datingMethod="#julian" from-custom="1666-09-02" to-custom="1666-09-05"> <head>The Great Fire of London</head> <p>The Great Fire of London burned through a large part of the city of London.</p> </event>

to-custom

indicates the ending point of the period in some custom standard form.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace

datingPoint

supplies a pointer to some location defining a named point in time with reference to which the datable item is understood to have occurred.

Status	Optional
Datatype	teidata.pointer

datingMethod

supplies a pointer to a <calendar> element or other means of interpreting the values of the custom dating attributes.

Status	Optional
Datatype	teidata.pointer
Contayning the Originall, Antiquity, Increaſe, Moderne eſtate, and deſcription of that Citie, written in the yeare <date when-custom="1598" calendar="#julian" datingMethod="#julian">1598</date>. by Iohn Stow Citizen of London. In this example, the calendar attribute points to a <calendar> element for the Julian calendar, specifying that the text content of the <date> element is a Julian date, and the datingMethod attribute also points to the Julian calendar to indicate that the content of the when-custom attribute value is Julian too.
<date when="1382-06-28" when-custom="6890-06-20" datingMethod="#creationOfWorld"> μηνὶ Ἰουνίου εἰς <num>κ</num> ἔτους <num>ςωϞ</num> </date> In this example, a date is given in a Mediaeval text measured ‘from the creation of the world’, which is normalized (in when) to the Gregorian date, but is also normalized (in when-custom) to a machine-actionable, numeric version of the date from the Creation.
Note	Note that the datingMethod attribute (unlike calendar defined in att.datable) defines the calendar or dating system to which the date described by the parent element is normalized (i.e. in the when-custom or other X-custom attributes), not the calendar of the original date in the element.

Appendix A.3.4 att.datable.iso

att.datable.iso provides attributes for normalization of elements that contain datable events using the ISO 8601:2004 standard. [3.6.4. Dates and Times 14.4. Dates]

Module

namesdates — Formal specification

Members

att.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title]

Attributes

when-iso

supplies the value of a date or time in a standard form.

Status	Optional
Datatype	teidata.temporal.iso
The following are examples of ISO date, time, and date & time formats that are not valid W3C format normalizations. <date when-iso="1996-09-24T07:25+00">Sept. 24th, 1996 at 3:25 in the morning</date> <date when-iso="1996-09-24T03:25-04">Sept. 24th, 1996 at 3:25 in the morning</date> <time when-iso="1999-01-04T20:42-05">4 Jan 1999 at 8:42 pm</time> <time when-iso="1999-W01-1T20,70-05">4 Jan 1999 at 8:42 pm</time> <date when-iso="2006-05-18T10:03">a few minutes after ten in the morning on Thu 18 May</date> <time when-iso="03:00">3 A.M.</time> <time when-iso="14">around two</time> <time when-iso="15,5">half past three</time> All of the examples of the when attribute in the att.datable.w3c class are also valid with respect to this attribute.
He likes to be punctual. I said <q> <time when-iso="12">around noon</time> </q>, and he showed up at <time when-iso="12:00:00">12 O'clock</time> on the dot. The second occurence of <time> could have been encoded with the when attribute, as 12:00:00 is a valid time with respect to the W3C XML Schema Part 2: Datatypes Second Edition specification. The first occurence could not.

notBefore-iso

specifies the earliest possible date for the event in standard form, e.g. yyyy-mm-dd.

Status	Optional
Datatype	teidata.temporal.iso

notAfter-iso

specifies the latest possible date for the event in standard form, e.g. yyyy-mm-dd.

Status	Optional
Datatype	teidata.temporal.iso

from-iso

indicates the starting point of the period in standard form.

Status	Optional
Datatype	teidata.temporal.iso

to-iso

indicates the ending point of the period in standard form.

Status	Optional
Datatype	teidata.temporal.iso

Note

The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by ISO 8601:2004, using the Gregorian calendar.

If both when-iso and dur-iso are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. That is,

indicates the same time period as

In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading.

Appendix A.3.5 att.datable.w3c

att.datable.w3c provides attributes for normalization of elements that contain datable events conforming to the W3C XML Schema Part 2: Datatypes Second Edition. [3.6.4. Dates and Times 14.4. Dates]

Module tei — Formal specification

Members att.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title]

Attributes

when

supplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.

Status	Optional
Datatype	teidata.temporal.w3c
Examples of W3C date, time, and date & time formats. <p> <date when="1945-10-24">24 Oct 45</date> <date when="1996-09-24T07:25:00Z">September 24th, 1996 at 3:25 in the morning</date> <time when="1999-01-04T20:42:00-05:00">Jan 4 1999 at 8 pm</time> <time when="14:12:38">fourteen twelve and 38 seconds</time> <date when="1962-10">October of 1962</date> <date when="--06-12">June 12th</date> <date when="---01">the first of the month</date> <date when="--08">August</date> <date when="2006">MMVI</date> <date when="0056">AD 56</date> <date when="-0056">56 BC</date> </p>
This list begins in the year 1632, more precisely on Trinity Sunday, i.e. the Sunday after Pentecost, in that year the <date calendar="#julian" when="1632-06-06">27th of May (old style)</date>.
<opener> <dateline> <placeName>Dorchester, Village,</placeName> <date when="1828-03-02">March 2d. 1828.</date> </dateline> <salute>To Mrs. Cornell,</salute> Sunday <time when="12:00:00">noon.</time> </opener>

notBefore

specifies the earliest possible date for the event in standard form, e.g. yyyy-mm-dd.

Status	Optional
Datatype	teidata.temporal.w3c

notAfter

specifies the latest possible date for the event in standard form, e.g. yyyy-mm-dd.

Status	Optional
Datatype	teidata.temporal.w3c

from

indicates the starting point of the period in standard form, e.g. yyyy-mm-dd.

Status	Optional
Datatype	teidata.temporal.w3c

indicates the ending point of the period in standard form, e.g. yyyy-mm-dd.

Status	Optional
Datatype	teidata.temporal.w3c

Schematron

<sch:rule context="tei:*[@when]"> <sch:report test="@notBefore|@notAfter|@from|@to" role="nonfatal">The @when attribute cannot be used with any other att.datable.w3c attributes.</sch:report> </sch:rule>

Schematron

<sch:rule context="tei:*[@from]"> <sch:report test="@notBefore" role="nonfatal">The @from and @notBefore attributes cannot be used together.</sch:report> </sch:rule>

Schematron

<sch:rule context="tei:*[@to]"> <sch:report test="@notAfter" role="nonfatal">The @to and @notAfter attributes cannot be used together.</sch:report> </sch:rule>

Example

<date from="1863-05-28" to="1863-06-01">28 May through 1 June 1863</date>

Note

The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by XML Schema Part 2: Datatypes Second Edition, using the Gregorian calendar.

The most commonly-encountered format for the date portion of a temporal attribute is yyyy-mm-dd, but yyyy, --mm, ---dd, yyyy-mm, or --mm-dd may also be used. For the time part, the form hh:mm:ss is used.

Note that this format does not currently permit use of the value 0000 to represent the year 1 BCE; instead the value -0001 should be used.

Appendix A.3.6 att.datcat

att.datcat provides attributes that are used to align XML elements or attributes with the appropriate Data Categories (DCs) defined by an external taxonomy, in this way establishing the identity of information containers and values, and providing means of interpreting them. [10.5.2. Lexical View 19.3. Other Atomic Feature Values]

Module tei — Formal specification

Members att.segLike[pc phr s seg w] category tagUsage taxonomy

Attributes

datcat

provides a pointer to a definition of, and/or general information about, (a) an information container (element or attribute) or (b) a value of an information container (element content or attribute value), by referencing an external taxonomy or ontology. If valueDatcat is present in the immediate context, this attribute takes on role (a), while valueDatcat performs role (b).

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

valueDatcat

provides a definition of, and/or general information about a value of an information container (element content or attribute value), by reference to an external taxonomy or ontology. Used especially where a contrast with datcat is needed.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

targetDatcat

provides a definition of, and/or general information about, information structure of an object referenced or modeled by the containing element, by reference to an external taxonomy or ontology. This attribute has the characteristics of the datcat attribute, except that it addresses not its containing element, but an object that is being referenced or modeled by its containing element.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

Example

The example below presents the TEI encoding of the name-value pair <part of speech, common noun>, where the name (key) ‘part of speech’ is abbreviated as ‘POS’, and the value, ‘common noun’ is symbolized by ‘NN’. The entire name-value pair is encoded by means of the element <f>. In TEI XML, that element acts as the container, labeled with the name attribute. Its contents may be complex or simple. In the case at hand, the content is the symbol ‘NN’.The datcat attribute relates the feature name (i.e., the key) to the data category ‘part of speech’, while the attribute valueDatcat relates the feature value to the data category common noun. Both these data categories should be defined in an external and preferably open reference taxonomy or ontology.

‘NN’ is the symbol for common noun used e.g. in the CLAWS-7 tagset defined by the University Centre for Computer Corpus Research on Language at the University of Lancaster. The very same data category used for tagging an early version of the British National Corpus, and coming from the BNC Basic (C5) tagset, uses the symbol ‘NN0’ (rather than ‘NN’). Making these values semantically interoperable would be extremely difficult without a human expert if they were not anchored in a single point of an established reference taxonomy of morphosyntactic data categories. In the case at hand, the string ‘http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545’ is both a persistent identifier of the data category in question, as well as a pointer to a shared definition of common noun.While the symbols ‘NN’, ‘NN0’, and many others (often coming from languages other than English) are implicitly members of the container category ‘part of speech’, it is sometimes useful not to rely on such an implicit relationship but rather use an explicit identifier for that data category, to distinguish it from other morphosyntactic data categories, such as gender, tense, etc. For that purpose, the above example uses the datcat attribute to reference a definition of part of speech. The reference taxonomy in this example is the CLARIN Concept Registry.If the feature structure markup exemplified above is to be repeated many times in a single document, it is much more efficient to gather the persistent identifiers in a single place and to only reference them, implicitly or directly, from feature structure markup. The following example is much more concise than the one above and relies on the concepts of feature structure declaration and feature value library, discussed in chapter [[undefined FS]].

The assumption here is that the relevant feature values are collected in a place that the annotation document in question has access to — preferably, a single document per linguistic resource, for example an <fsdDecl> that is XIncluded as a sibling of <text> or a child of <encodingDesc>; a <taxonomy> available resource-wide (e.g., in a shared header) is also an option.The example below presents an <fvLib> element that collects the relevant feature values (most of them omitted). At the same time, this example shows one way of encoding a tagset, i.e., an established inventory of values of (in the case at hand) morphosyntactic categories.

Note that these Guidelines do not prescribe a specific choice between datcat and valueDatcat in such cases. The former is the generic way of referencing a data category, whereas the latter is more specific, in that it references a data category that represents a value. The choice between them comes into play where a single element — or a tight element complex, such as the <f>/<symbol> complex illustrated above — make it necessary or useful to distinguish between the container data category and its value.

Example

In the context of dictionaries designed with semantic interoperability in mind, the following example ensures that the <pos> element is interpreted as the same information container as in the case of the example of <f name="POS"> above.

Efficiency of this type of interoperable markup demands that the references to the particular data categories should best be provided in a single place within the dictionary (or a single place within the project), rather than being repeated inside every entry. For the container elements, this can be achieved at the level of <tagUsage>, although here, the valueDatcat attribute should be used, because it is not the <tagUsage> element that is associated with the relevant data category, but rather the element <pos> (or <case>, etc.) that is described by <tagUsage>:

<tagsDecl partial="true">  <namespace name="http://www.tei-c.org/ns/1.0"> <tagUsage gi="pos" targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">Contains the part of speech.</tagUsage> <tagUsage gi="case" targetDatcat="http://hdl.handle.net/11459/CCR_C-1840_9f4e319c-f233-6c90-9117-7270e215f039">Contains information about the grammatical case that the described form is inflected for.</tagUsage>  </namespace> </tagsDecl>

Another possibility is to shorten the URIs by means of the <prefixDef> mechanism, as illustrated below:

<listPrefixDef> <prefixDef ident="ccr" matchPattern="pos" replacementPattern="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"/> <prefixDef ident="ccr" matchPattern="adj" replacementPattern="http://hdl.handle.net/11459/CCR_C-1230_23653c21-fca1-edf8-fd7c-3df2d6499157"/> </listPrefixDef>  <entry>  <form> <orth>isotope</orth> </form> <gramGrp> <pos datcat="ccr:pos" valueDatcat="ccr:adj">adj</pos> </gramGrp>  </entry>

This mechanism creates implications that are not always wanted, among others, in the case at hand, suggesting that the identifiers ‘pos’ and ‘adj’ belong to a namespace associated with the CLARIN Concept Repository (CCR), whereas that is solely a shorthand mechanism whose scope is the current resource. Documenting this clearly in the header of the dictionary is therefore advised.Yet another possibility is to associate the information about the relationship between a TEI markup element and the data category that it is intended to model already at the level of modeling the dictionary resource, that is, at the level of the ODD, in the <equiv> element that is a child of <elementSpec> or <attDef>.

Example

The <taxonomy> element is a handy tool for encoding taxonomies that are later referenced by att.datcat attributes, but it can also act as an intermediary device, for example holding a fragment of an external taxonomy (or ‘flattening’ an external ontology) that is relevant to the project or document at hand. (It is also imaginable that, for the purpose of the project at hand, the local <taxonomy> element combines vocabularies that originate from more than one external taxonomy or ontology.) In such cases, the <taxonomy> creates a local layer of indirection: the att.datcat attributes internal to the resource may reference the <category> elements stored in the header (as well as the <taxonomy> element itself), whereas these same <category> and <taxonomy> elements use att.datcat attributes to reference the original taxonomy or ontology.

<encodingDesc>  <classDecl>  <taxonomy xml:id="UD-SYN" datcat="https://universaldependencies.org/u/dep/index.html"> <desc> <term>UD syntactic relations</term> </desc> <category xml:id="acl" valueDatcat="https://universaldependencies.org/u/dep/acl.html"> <catDesc> <term>acl</term>: Clausal modifier of noun (adjectival clause)</catDesc> </category> <category xml:id="acl_relcl" valueDatcat="https://universaldependencies.org/u/dep/acl-relcl.html"> <catDesc> <term>acl:relcl</term>: relative clause modifier</catDesc> </category> <category xml:id="advcl" valueDatcat="https://universaldependencies.org/u/dep/advcl.html"> <catDesc> <term>advcl</term>: Adverbial clause modifier</catDesc> </category>  </taxonomy> </classDecl> </encodingDesc>

The above fragment was excerpted from the GB subset of the ParlaMint project in April 2023, and enriched with att.datcat attributes for the purpose of illustrating the mechanism described here.Note that, in the ideal case, the values of att.datcat attributes should be persistent identifiers, and that the addressing scheme of Universal Dependencies is treated here as persistent for the sake of illustration. Note also that the contrast between datcat used on <taxonomy> on the one hand, and the valueDatcat used on <category> on the other, is not mandatory: both kinds of relations could be encoded by means of the generic datcat attribute, but using the former for the container and the latter for the content is more user-friendly.

Example

The targetDatcat attribute is designed to be used in, e.g., feature structure declarations, and is analogous to the targetLang attribute of the att.pointing class, in that it describes the object that is being referenced, rather than the referencing object.

<fDecl name="POS" targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"> <fDescr>part of speech (morphosyntactic category)</fDescr> <vRange> <vAlt> <symbol value="NN" datcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"/> <symbol value="NP" datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/>  </vAlt> </vRange> </fDecl>

Above, the <fDecl> uses targetDatcat, because if it were to use datcat, it would be asserting that it is an instance of the container data category part of speech, whereas it is not — it models a container (<f>) that encodes a part of speech. Note also that it is the <f> that is modeled above, not its values, which are used as direct references to data categories; hence the use of datcat in the <symbol> element.

Example

The att.datcat attributes can be used for any sort of taxonomies. The example below illustrates their usefulness for describing usage domain labels in dictionaries on the example of the Diccionario da Lingua Portugueza by António de Morais Silva, retro-digitised in the MORDigital project.

<encodingDesc> <classDecl> <taxonomy xml:id="domains">  <category xml:id="domain.medical_and_health_sciences"> <catDesc xml:lang="en">Medical and Health Sciences</catDesc> <catDesc xml:lang="pt">Ciências Médicas e da Saúde</catDesc> <category xml:id="domain.medical_and_health_sciences.medicine" valueDatcat="https://vocabs.rossio.fcsh.unl.pt/pub/morais_domains/pt/page/0025"> <catDesc xml:lang="en"> <term>Medicine</term> <gloss>  </gloss> </catDesc> <catDesc xml:lang="pt"> <term>Medicina</term> <gloss>  </gloss> </catDesc> </category> </category>  </taxonomy> </classDecl> </encodingDesc>  <usg type="domain" valueDatcat="#domain.medical_and_health_sciences.medicine">Med.</usg>

In the Morais dictionary, the relevant domain labels are in the header, getting referenced inside the dictionary, from <usg> elements. The vocabulary used for dictionary-internal labelling is in turn anchored in the MorDigital controlled vocabulary service of the NOVA University of Lisbon – School of Social Sciences and Humanities (NOVA FCSH).

Note

The TEI Abstract Model can be expressed as a hierarchy of attribute-value matrices (AVMs) of various types and of various levels of complexity, nested or grouped in various ways. At the most abstract level, an AVM consists of an information container and the value (contents) of that container.

A simple example of an XML serialization of such structures is, on the one hand, the opening and closing tags that delimit and name the container, and, on the other, the content enclosed by the two tags that constitues the value. An analogous example is an attribute name and the value of that attribute.

In a TEI XML example of two equivalent serializations expressing the name-value pair <part-of-speech,common-noun>, namely <pos>commonNoun</pos> and pos="common-noun", one would classify the element <pos> and the attribute pos as containers (mapping onto the first member of the relevant name-value pair), while the character data content of <pos> or the value of pos would be seen as mapping onto the second member of the pair.

The att.datcat class provides means of addressing the containers and their values, while at the same time providing a way to interpret them in the context of external taxonomies or ontologies. Aligning e.g. both the <pos> element and the pos attribute with the same value of an external reference point (i.e., an entry in an agreed taxonomy) affirms the identity of the concept serialised by both the element container and the attribute container, and optionally provides a definition of that concept (in the case at hand, the concept part of speech).

The value of the att.datcat attributes should be a PID (persistent identifier) that points to a specific — and, ideally, shared — taxonomy or ontology. Among the resources that can, to a lesser or greater extent, be used as inventories of (more or less) standardized linguistic categories are the GOLD ontology, CLARIN CCR, OLiA, or TermWeb's DatCatInfo, and also the Universal Dependencies inventory, on the assumption that its URIs are going to persist. It is imaginable that a project may choose to address a local taxonomy store instead, but this risks losing the advantage of interchangeability with other projects.

Historically, datcat and valueDatcat originate from the (now obsolete) ISO 12620:2009 standard, describing the data model and procedures for a Data Category Registry (DCR). The current version of that standard, ISO 12620-1, does not standardize the serialization of pointers, merely mentioning the TEI att.datcat as an example.

Note that no constraint prevents the occurrence of a combination of att.datcat attributes: the <fDecl> element, which is a natural bearer of the targetDatcat attribute, is an instance of a specific modeling element, and, in principle, could be semantically fixed by an appropriate reference taxonomy of modeling devices.

Appendix A.3.7 att.declarable

att.declarable provides attributes for those elements in the TEI header which may be independently selected by means of the special purpose decls attribute. [16.3. Associating Contextual Information with a Text]

Module

tei — Formal specification

Members

availability bibl correction editorialDecl equipment equipment hyphenation langUsage listEvent listOrg listPerson normalization particDesc projectDesc quotation recording segmentation settingDesc sourceDesc textClass

Attributes

default

indicates whether or not this element is selected by default when its parent is selected.

Status	Optional
Datatype	teidata.truthValue
Legal values are:	true This element is selected if its parent is selected false This element can only be selected explicitly, unless it is the only one of its kind, in which case it is selected if its parent is selected.[Default]

Note

The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 16.3. Associating Contextual Information with a Text. Only one element of a particular type may have a default attribute with a value of true.

Appendix A.3.8 att.duration

att.duration provides attributes for normalization of elements that contain datable events.
Module	spoken — Formal specification
Members	att.timed[gap incident kinesic media u vocal] date recording time
Attributes	att.duration.iso @dur-iso att.duration.w3c @dur
Note	This ‘superclass’ provides attributes that can be used to provide normalized values of temporal information. By default, the attributes from the att.duration.w3c class are provided. If the module for names & dates is loaded, this class also provides attributes from the att.duration.iso class. In general, the possible values of attributes restricted to the W3C datatypes form a subset of those values available via the ISO 8601 standard. However, the greater expressiveness of the ISO datatypes is rarely needed, and there exists much greater software support for the W3C datatypes.

Appendix A.3.9 att.duration.iso

att.duration.iso provides attributes for recording normalized temporal durations. [3.6.4. Dates and Times 14.4. Dates]

Module

tei — Formal specification

Members

att.duration[att.timed[gap incident kinesic media u vocal] date recording time]

Attributes

dur-iso

(duration) indicates the length of this element in time.

Status	Optional
Datatype	teidata.duration.iso

Note

If both when and dur or dur-iso are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. In order to represent a time range by a duration and its ending time the when-iso attribute must be used.

Appendix A.3.10 att.duration.w3c

att.duration.w3c provides attributes for recording normalized temporal durations. [3.6.4. Dates and Times 14.4. Dates]

Module

tei — Formal specification

Members

att.duration[att.timed[gap incident kinesic media u vocal] date recording time]

Attributes

dur

(duration) indicates the length of this element in time.

Status	Optional
Datatype	teidata.duration.w3c

Note

If both when and dur are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. In order to represent a time range by a duration and its ending time the when-iso attribute must be used.

Appendix A.3.11 att.fragmentable

att.fragmentable provides attributes for representing fragmentation of a structural element, typically as a consequence of some overlapping hierarchy.

Module

tei — Formal specification

Members

att.divLike[div] att.segLike[pc phr s seg w] p

Attributes

part

specifies whether or not its parent element is fragmented in some way, typically by some other overlapping structure: for example a speech which is divided between two or more verse stanzas, a paragraph which is split across a page division, a verse line which is divided between two speakers.

Status	Optional
Datatype	teidata.enumerated
Legal values are:	Y (yes) the element is fragmented in some (unspecified) respect N (no) the element is not fragmented, or no claim is made as to its completeness[Default] I (initial) this is the initial part of a fragmented element M (medial) this is a medial part of a fragmented element F (final) this is the final part of a fragmented element
Note	The values I, M, or F should be used only where it is clear how the element may be reconstituted.

Appendix A.3.12 att.global

att.global provides attributes common to all elements in the TEI encoding scheme. [1.3.1.1. Global Attributes]

Module

tei — Formal specification

Members

TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w

Attributes

att.global.analytic
- @ana
att.global.linking
- @corresp
- @synch
- @next
- @prev
att.global.rendition
- @rend
- @style
- @rendition
att.global.responsibility
- @resp
att.global.source
- @source

xml:id

(identifier) provides a unique identifier for the element bearing the attribute.

Status	Optional
Datatype	ID
Note	The xml:id attribute may be used to specify a canonical reference for an element; see section 3.11. Reference Systems.

(number) gives a number (or other label) for an element, which is not necessarily unique within the document.

Status	Optional
Datatype	teidata.text
Note	The value of this attribute is always understood to be a single token, even if it contains space or other punctuation characters, and need not be composed of numbers only. It is typically used to specify the numbering of chapters, sections, list items, etc.; it may also be used in the specification of a standard reference system for the text.

xml:lang

(language) indicates the language of the element content using a ‘tag’ generated according to BCP 47.

Status	Optional
Datatype	teidata.language
<p> … The consequences of this rapid depopulation were the loss of the last <foreign xml:lang="rap">ariki</foreign> or chief (Routledge 1920:205,210) and their connections to ancestral territorial organization.</p>
Note	The xml:lang value will be inherited from the immediately enclosing element, or from its parent, and so on up the document hierarchy. It is generally good practice to specify xml:lang at the highest appropriate level, noticing that a different default may be needed for the <teiHeader> from that needed for the associated resource element or elements, and that a single TEI document may contain texts in many languages. Only attributes with free text values (rare in these guidelines) will be in the scope of xml:lang. The authoritative list of registered language subtags is maintained by IANA and is available at https://www.iana.org/assignments/language-subtag-registry. For a good general overview of the construction of language tags, see https://www.w3.org/International/articles/language-tags/, and for a practical step-by-step guide, see https://www.w3.org/International/questions/qa-choosing-language-tags.en.php. The value used must conform with BCP 47. If the value is a private use code (i.e., starts with x- or contains -x-), a <language> element with a matching value for its ident attribute should be supplied in the TEI header to document this value. Such documentation may also optionally be supplied for non-private-use codes, though these must remain consistent with their (IETF)Internet Engineering Task Force definitions.

xml:base

provides a base URI reference with which applications can resolve relative URI references into absolute URI references.

Status	Optional
Datatype	teidata.pointer
<div type="bibl"> <head>Selections from <title level="m">The Collected Letters of Robert Southey. Part 1: 1791-1797</title> </head> <listBibl xml:base="https://romantic-circles.org/sites/default/files/imported/editions/southey_letters/XML/"> <bibl> <ref target="letterEEd.26.3.xml"> <title>Robert Southey to Grosvenor Charles Bedford</title>, <date when="1792-04-03">3 April 1792</date>. </ref> </bibl> <bibl> <ref target="letterEEd.26.57.xml"> <title>Robert Southey to Anna Seward</title>, <date when="1793-09-18">18 September 1793</date>. </ref> </bibl> <bibl> <ref target="letterEEd.26.85.xml"> <title>Robert Southey to Robert Lovell</title>, <date from="1794-04-05" to="1794-04-06">5-6 April, 1794</date>. </ref> </bibl> </listBibl> </div>

xml:space

signals an intention about how white space should be managed by applications.

Status	Optional
Datatype	teidata.enumerated
Legal values are:	default signals that the application's default white-space processing modes are acceptable preserve indicates the intent that applications preserve all white space
Note	The XML specification provides further guidance on the use of this attribute. Note that many parsers may not handle xml:space correctly.

Appendix A.3.13 att.global.analytic

att.global.analytic provides additional global attributes for associating specific analyses or interpretations with appropriate portions of a text. [18.2. Global Attributes for Simple Analyses 18.3. Spans and Interpretations]

Module

analysis — Formal specification

Members

att.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w]

Attributes

ana

(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
Note	When multiple values are given, they may reflect either multiple divergent interpretations of an ambiguous text, or multiple mutually consistent interpretations of the same passage in different contexts.

Appendix A.3.14 att.global.linking

att.global.linking provides a set of attributes for hypertextual linking. [17. Linking, Segmentation, and Alignment]

Module

linking — Formal specification

Members

Attributes

corresp

(corresponds) points to elements that correspond to the current element in some way.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
<group> <text xml:id="t1-g1-t1" xml:lang="mi"> <body xml:id="t1-g1-t1-body1"> <div type="chapter"> <head>He Whakamaramatanga mo te Ture Hoko, Riihi hoki, i nga Whenua Maori, 1876.</head> <p>…</p> </div> </body> </text> <text xml:id="t1-g1-t2" xml:lang="en"> <body xml:id="t1-g1-t2-body1" corresp="#t1-g1-t1-body1"> <div type="chapter"> <head>An Act to regulate the Sale, Letting, and Disposal of Native Lands, 1876.</head> <p>…</p> </div> </body> </text> </group> In this example a <group> contains two <text>s, each containing the same document in a different language. The correspondence is indicated using corresp. The language is indicated using xml:lang, whose value is inherited; both the tag with the corresp and the tag pointed to by the corresp inherit the value from their immediate parent.
<!-- In a placeography called "places.xml" --><place xml:id="LOND1" corresp="people.xml#LOND2 people.xml#GENI1"> <placeName>London</placeName> <desc>The city of London...</desc> </place> <!-- In a literary personography called "people.xml" --> <person xml:id="LOND2" corresp="places.xml#LOND1 #GENI1"> <persName type="lit">London</persName> <note> <p>Allegorical character representing the city of <placeName ref="places.xml#LOND1">London</placeName>.</p> </note> </person> <person xml:id="GENI1" corresp="places.xml#LOND1 #LOND2"> <persName type="lit">London’s Genius</persName> <note> <p>Personification of London’s genius. Appears as an allegorical character in mayoral shows. </p> </note> </person> In this example, a <place> element containing information about the city of London is linked with two <person> elements in a literary personography. This correspondence represents a slightly looser relationship than the one in the preceding example; there is no sense in which an allegorical character could be substituted for the physical city, or vice versa, but there is obviously a correspondence between them.

synch

(synchronous) points to elements that are synchronous with the current element.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace

points to the next element of a virtual aggregate of which the current element is part.

Status	Optional
Datatype	teidata.pointer
Note	It is recommended that the element indicated be of the same type as the element bearing this attribute.

(previous) points to the previous element of a virtual aggregate of which the current element is part.

Status	Optional
Datatype	teidata.pointer
Note	It is recommended that the element indicated be of the same type as the element bearing this attribute.

Appendix A.3.15 att.global.rendition

att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme. [1.3.1.1.3. Rendition Indicators]

Module

tei — Formal specification

Members

Attributes

rend

(rendition) indicates how the element in question was rendered or presented in the source text.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace
<head rend="align(center) case(allcaps)"> <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/> <hi rend="case(mixed)">New Blazing-World</hi>. </head>
Note	These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines. The values of the rend attribute are a set of sequence-indeterminate individual tokens separated by whitespace.

style

contains an expression in some formal style definition language which defines the rendering or presentation used for this element in the source text.

Status	Optional
Datatype	teidata.text
<head style="text-align: center; font-variant: small-caps"> <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/> <hi style="font-variant: normal">New Blazing-World</hi>. </head>
Note	Unlike the attribute values of rend, which uses whitespace as a separator, the style attribute may contain whitespace. This attribute is intended for recording inline stylistic information concerning the source, not any particular output. The formal language in which values for this attribute are expressed may be specified using the <styleDefDecl> element in the TEI header. If style and rendition are both present on an element, then style overrides or complements rendition. style should not be used in conjunction with rend, because the latter does not employ a formal style definition language.

rendition

points to a description of the rendering or presentation used for this element in the source text.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
<head rendition="#ac #sc"> <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/> <hi rendition="#normal">New Blazing-World</hi>. </head> <!-- elsewhere... --> <rendition xml:id="sc" scheme="css">font-variant: small-caps</rendition> <rendition xml:id="normal" scheme="css">font-variant: normal</rendition> <rendition xml:id="ac" scheme="css">text-align: center</rendition>
Note	The rendition attribute is used in a very similar way to the class attribute defined for XHTML but with the important distinction that its function is to describe the appearance of the source text, not necessarily to determine how that text should be presented on screen or paper. If rendition is used to refer to a style definition in a formal language like CSS, it is recommended that it not be used in conjunction with rend. Where both rendition and rend are supplied, the latter is understood to override or complement the former. Each URI provided should indicate a <rendition> element defining the intended rendition in terms of some appropriate style language, as indicated by the scheme attribute.

Appendix A.3.16 att.global.responsibility

att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text, the markup or something asserted by the markup, and the degree of certainty associated with it. [1.3.1.1.4. Sources, certainty, and responsibility 3.5. Simple Editorial Changes 12.3.2.2. Hand, Responsibility, and Certainty Attributes 18.3. Spans and Interpretations 14.1.1. Linking Names and Their Referents]

Module

tei — Formal specification

Members

Attributes

resp

(responsible party) indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
Note	To reduce the ambiguity of a resp pointing directly to a person or organization, we recommend that resp be used to point not to an agent (<person> or <org>) but to a <respStmt>, <author>, <editor> or similar element which clarifies the exact role played by the agent. Pointing to multiple <respStmt>s allows the encoder to specify clearly each of the roles played in part of a TEI file (creating, transcribing, encoding, editing, proofing etc.).

Example

Blessed are the <choice> <sic>cheesemakers</sic> <corr resp="#editor" cert="high">peacemakers</corr> </choice>: for they shall be called the children of God.

Example

<lg>  <l>Punkes, Panders, baſe extortionizing sla<choice> <sic>n</sic> <corr resp="#JENS1_transcriber">u</corr> </choice>es,</l>  </lg>   <respStmt xml:id="JENS1_transcriber"> <resp when="2014">Transcriber</resp> <name>Janelle Jenstad</name> </respStmt>

Appendix A.3.17 att.global.source

att.global.source provides attributes used by elements to point to an external source. [1.3.1.1.4. Sources, certainty, and responsibility 3.3.3. Quotation 8.3.4. Writing]

Module tei — Formal specification

Members att.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w]

Attributes

source

specifies the source from which some aspect of this element is drawn.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
Schematron	<sch:rule context="tei:*[@source]"> <sch:let name="srcs" value="tokenize( normalize-space(@source),' ')"/> <sch:report test="( self::tei:classRef \| self::tei:dataRef \| self::tei:elementRef \| self::tei:macroRef \| self::tei:moduleRef \| self::tei:schemaSpec ) and $srcs[2]"> When used on a schema description element (like <sch:value-of select="name(.)"/>), the @source attribute should have only 1 value. (This one has <sch:value-of select="count($srcs)"/>.) </sch:report> </sch:rule>
Note	The source attribute points to an external source. When used on an element describing a schema component (<classRef>, <dataRef>, <elementRef>, <macroRef>, <moduleRef>, or <schemaSpec>), it identifies the source from which declarations for the components should be obtained. On other elements it provides a pointer to the bibliographical source from which a quotation or citation is drawn. In either case, the location may be provided using any form of URI, for example an absolute URI, a relative URI, a private scheme URI of the form `tei:x.y.z`, where `x.y.z` indicates the version number, e.g. `tei:4.3.2` for TEI P5 release 4.3.2 or (as a special case) `tei:current` for whatever is the latest release, or a private scheme URI that is expanded to an absolute URI as documented in a <prefixDef>. When used on elements describing schema components, source should have only one value; when used on other elements multiple values are permitted.

Example

<p>  As Willard McCarty (<bibl xml:id="mcc_2012">2012, p.2</bibl>) tells us, <quote source="#mcc_2012">‘Collaboration’ is a problematic and should be a contested term.</quote>  </p>

Example

<p>  <quote source="#chicago_15_ed">Grammatical theories are in flux, and the more we learn, the less we seem to know.</quote>  </p>  <bibl xml:id="chicago_15_ed"> <title level="m">The Chicago Manual of Style</title>, <edition>15th edition</edition>. <pubPlace>Chicago</pubPlace>: <publisher>University of Chicago Press</publisher> (<date>2003</date>), <biblScope unit="page">p.147</biblScope>. </bibl>

Example

Include in the schema an element named <p> available from the TEI P5 2.0.1 release.

Example

Create a schema using components taken from the file mycompiledODD.xml.

Appendix A.3.18 att.internetMedia

att.internetMedia provides attributes for specifying the type of a computer resource using a standard taxonomy.

Module

tei — Formal specification

Members

att.media[graphic media] ref

Attributes

mimeType

(MIME media type) specifies the applicable multimedia internet mail extension (MIME) media type.

Status	Optional
Datatype	1–∞ occurrences of teidata.word separated by whitespace

Example

In this example mimeType is used to indicate that the URL points to a TEI XML file encoded in UTF-8.

Note

This attribute class provides an attribute for describing a computer resource, typically available over the internet, using a value taken from a standard taxonomy. At present only a single taxonomy is supported, the Multipurpose Internet Mail Extensions (MIME) Media Type system. This typology of media types is defined by the Internet Engineering Task Force in RFC 2046. The list of types is maintained by the Internet Assigned Numbers Authority (IANA). The mimeType attribute must have a value taken from this list.

Appendix A.3.19 att.lexicographic.normalized

att.lexicographic.normalized provides attributes for usage within word-level elements in the analysis module and within lexicographic microstructure in the dictionaries module.

Module

analysis — Formal specification

Members

att.linguistic[pc w]

Attributes

norm

(normalized) provides the normalized/standardized form of information present in the source text in a non-normalized form.

Status	Optional
Datatype	teidata.text
Normalization of part-of-speech information within a dictionary entry. <gramGrp> <pos norm="noun">n</pos> </gramGrp>
Normalization of a source form in a tokenized historical corpus. <s> <w>for</w> <w norm="virtue's">vertues</w> <w>sake</w> </s>
<s> <w norm="persuasion">perswasion</w> <w>of</w> <w norm="Unity">Vnitie</w> </s>
Example of normalization from Aviso. Relation oder Zeitung. Wolfenbüttel, 1609. In: Deutsches Textarchiv. <s> <w norm="freiwillig">freywillig</w> <pc norm="," join="left">/</pc> <w norm="unbedrängt">vnbedraͤngt</w> <w norm="und">vnd</w> <w norm="unverhindert">vnuerhindert</w> </s>
<w norm="Teil">Theyll</w>
<w norm="Freude">Frewde</w>

Note

It needs to be stressed that the two attributes in this class are meant for strictly lexicographic and linguistic uses, and not for editorial interventions. For the latter, the mechanism based on <choice>, <orig>, and <reg> needs to be employed.

Appendix A.3.20 att.linguistic

att.linguistic provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically <w> and <pc> in the analysis module. [18.4.2. Lightweight Linguistic Annotation]

Module

analysis — Formal specification

Members

pc w

Attributes

att.lexicographic.normalized
- @norm

lemma

provides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections.

Status	Optional
Datatype	teidata.text
<w lemma="wife">wives</w>
<w lemma="Arznei">Artzeneyen</w>

pos

(part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).

Status	Optional
Datatype	teidata.text
The German sentence ‘Wir fahren in den Urlaub.’ tagged with the Stuttgart-Tuebingen-Tagset (STTS). <s> <w pos="PPER">Wir</w> <w pos="VVFIN">fahren</w> <w pos="APPR">in</w> <w pos="ART">den</w> <w pos="NN">Urlaub</w> <w pos="$.">.</w> </s>
The English sentence ‘We're going to Brazil.’ tagged with the CLAWS-5 tagset, arranged inline (with significant whitespace). <p><w pos="PNP">We</w><w pos="VBB">'re</w> <w pos="VVG">going</w> <w pos="PRP">to</w> <w pos="NP0">Brazil</w><pc pos="PUN">.</pc></p>
The English sentence ‘We're going on vacation to Brazil for a month!’ tagged with the CLAWS-7 tagset and arranged sequentially. <p> <w pos="PPIS2">We</w> <w pos="VBR">'re</w> <w pos="VVG">going</w> <w pos="II">on</w> <w pos="NN1">vacation</w> <w pos="II">to</w> <w pos="NP1">Brazil</w> <w pos="IF">for</w> <w pos="AT1">a</w> <w pos="NNT1">month</w> <pc pos="!">!</pc> </p>

msd

(morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).

Status	Optional
Datatype	teidata.text
<ab> <w pos="PPER" msd="1.Pl.*.Nom">Wir</w> <w pos="VVFIN" msd="1.Pl.Pres.Ind">fahren</w> <w pos="APPR" msd="--">in</w> <w pos="ART" msd="Def.Masc.Akk.Sg">den</w> <w pos="NN" msd="Masc.Akk.Sg">Urlaub</w> <pc pos="$." msd="--">.</pc> </ab>

join

when present, provides information on whether the token in question is adjacent to another, and if so, on which side.

Status	Optional
Datatype	teidata.text
Legal values are:	no the token is not adjacent to another left there is no whitespace on the left side of the token right there is no whitespace on the right side of the token both there is no whitespace on either side of the token overlap the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream
The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join. <s> <pc join="right">"</pc> <w join="left">Friends</w> <w>will</w> <w>be</w> <w join="right">friends</w> <pc join="both">.</pc> <pc join="left">"</pc> </s> Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally.
The English sentence ‘We're going on vacation.’ tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated. <p> <w pos="PNP">We</w> <w pos="VBB" join="left">'re</w> <w pos="VVG">going</w> <w pos="PRP">on</w> <w pos="NN1">vacation</w> <pc pos="PUN" join="left">.</pc> </p>
Note	The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012.

Note

These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section 18.4.2. Lightweight Linguistic Annotation for discussion.

Appendix A.3.21 att.naming

att.naming provides attributes common to elements which refer to named persons, places, organizations etc. [3.6.1. Referring Strings 14.3.7. Names and Nyms]

Module

tei — Formal specification

Members

att.personal[addName forename name orgName persName placeName roleName surname] affiliation birth death education event occupation pubPlace state

Attributes

att.canonical
- @key
- @ref

role

may be used to specify further information about the entity referenced by this name in the form of a set of whitespace-separated values, for example the occupation of a person, or the status of a place.

Status	Optional
Datatype	1–∞ occurrences of teidata.enumerated separated by whitespace

Appendix A.3.22 att.pointing

att.pointing provides a set of attributes used by all elements which point to other elements by means of one or more URI references. [1.3.1.1.2. Language Indicators 3.7. Simple Links and Cross-References]

Module tei — Formal specification

Members att.pointing.group[linkGrp] catRef licence link note ref term

Attributes

target

specifies the destination of the reference by supplying one or more URI References.

Status	Optional
Datatype	1–∞ occurrences of teidata.pointer separated by whitespace
Note	One or more syntactically valid URI references, separated by whitespace. Because whitespace is used to separate URIs, no whitespace is permitted inside a single URI. If a whitespace character is required in a URI, it should be escaped with the normal mechanism, e.g. `TEI%20Consortium`.

Appendix A.3.23 att.ranging

att.ranging provides attributes for describing numerical ranges.

Module

tei — Formal specification

Members

att.dimensions[birth date death gap state time] measure num

Attributes

atLeast

gives a minimum estimated value for the approximate measurement.

Status	Optional
Datatype	teidata.numeric

atMost

gives a maximum estimated value for the approximate measurement.

Status	Optional
Datatype	teidata.numeric

min

where the measurement summarizes more than one observation or a range, supplies the minimum value observed.

Status	Optional
Datatype	teidata.numeric

max

where the measurement summarizes more than one observation or a range, supplies the maximum value observed.

Status	Optional
Datatype	teidata.numeric

confidence

specifies the degree of statistical confidence (between zero and one) that a value falls within the range specified by min and max, or the proportion of observed values that fall within that range.

Status	Optional
Datatype	teidata.probability

Example

The MS. was lost in transmission by mail from <del rend="overstrike"> <gap reason="illegible" extent="one or two letters" atLeast="1" atMost="2" unit="chars"/> </del> Philadelphia to the Graphic office, New York.

Example

Americares has been supporting the health sector in Eastern Europe since 1986, and since 1992 has provided <measure atLeast="120000000" unit="USD" commodity="currency">more than $120m</measure> in aid to Ukrainians.

Appendix A.3.24 att.resourced

att.resourced provides attributes by which a resource (such as an externally held media file) may be located.

Module

tei — Formal specification

Members

graphic media

Attributes

url

(uniform resource locator) specifies the URL from which the media concerned may be obtained.

Status	Required
Datatype	teidata.pointer

Appendix A.3.25 att.typed

att.typed provides attributes that can be used to classify or subclassify elements in any way. [1.3.1. Attribute Classes 18.1.1. Words and Above 3.6.1. Referring Strings 3.7. Simple Links and Cross-References 3.6.5. Abbreviations and Their Expansions 3.13.1. Core Tags for Verse 7.2.5. Speech Contents 4.1.1. Un-numbered Divisions 4.1.2. Numbered Divisions 4.2.1. Headings and Trailers 4.4. Virtual Divisions 14.3.2.3. Personal Relationships 12.3.1.1. Core Elements for Transcriptional Work 17.1.1. Pointers and Links 17.3. Blocks, Segments, and Anchors 13.2. Linking the Apparatus to the Text 23.5.1.2. Defining Content Models: RELAX NG 8.3. Elements Unique to Spoken Texts 24.3.1.3. Modification of Attribute and Attribute Value Lists]

Module

tei — Formal specification

Members

att.pointing.group[linkGrp] TEI addName affiliation application bibl birth change date death desc div education event figure forename graphic head idno incident kinesic label link listEvent listOrg listPerson listRelation measure media name nameLink note num occupation org orgName pb pc persName phr placeName recording ref relation roleName s seg sex state surname teiCorpus term text time title unit vocal w

Attributes

type

characterizes the element in some sense, using any convenient classification scheme or typology.

Status	Optional
Datatype	teidata.enumerated
<div type="verse"> <head>Night in Tarras</head> <lg type="stanza"> <l>At evening tramping on the hot white road</l> <l>…</l> </lg> <lg type="stanza"> <l>A wind sprang up from nowhere as the sky</l> <l>…</l> </lg> </div>
Note	The type attribute is present on a number of elements, not all of which are members of att.typed, usually because these elements restrict the possible values for the attribute in a specific way.

subtype

(subtype) provides a sub-categorization of the element, if needed.

Status	Optional
Datatype	teidata.enumerated
Note	The subtype attribute may be used to provide any sub-classification for the element additional to that provided by its type attribute.

Schematron

<sch:rule context="tei:*[@subtype]"> <sch:assert test="@type">The <sch:name/> element should not be categorized in detail with @subtype unless also categorized in general with @type</sch:assert> </sch:rule>

Note

When appropriate, values from an established typology should be used. Alternatively a typology may be defined in the associated TEI header. If values are to be taken from a project-specific list, this should be defined using the <valList> element in the project-specific schema description, as described in 24.3.1.3. Modification of Attribute and Attribute Value Lists .

Appendix A.4 Datatypes

Appendix A.4.1 teidata.certainty

teidata.certainty defines the range of attribute values expressing a degree of certainty.
Module	tei — Formal specification
Used by	teidata.probCert
Content model	<content> <valList type="closed"> <valItem ident="high"/> <valItem ident="medium"/> <valItem ident="low"/> <valItem ident="unknown"/> </valList> </content> ⚓
Declaration	tei_teidata.certainty = "high" \| "medium" \| "low" \| "unknown"⚓
Note	Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter.

Appendix A.4.2 teidata.count

teidata.count defines the range of attribute values used for a non-negative integer value used as a count.
Module	tei — Formal specification
Used by	Element: tagUsage/@occurs
Content model	<content> <dataRef name="nonNegativeInteger"/> </content> ⚓
Declaration	tei_teidata.count = xsd:nonNegativeInteger⚓
Note	Any positive integer value or zero is permitted

Appendix A.4.3 teidata.duration.iso

teidata.duration.iso defines the range of attribute values available for representation of a duration in time using ISO 8601 standard formats.
Module	tei — Formal specification
Used by
Content model	<content> <dataRef name="token" restriction="[0-9.,DHMPRSTWYZ/:+\-]+"/> </content> ⚓
Declaration	tei_teidata.duration.iso = token { pattern = "[0-9.,DHMPRSTWYZ/:+\-]+" }⚓
Example	<time dur-iso="PT0,75H">three-quarters of an hour</time>
Example	<date dur-iso="P1,5D">a day and a half</date>
Example	<date dur-iso="P14D">a fortnight</date>
Example	<time dur-iso="PT0.02S">20 ms</time>
Note	A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the last, which may have a decimal component (using either `.` or `,` as the decimal point; the latter is preferred). If any number is 0, then that number-letter pair may be omitted. If any of the H (hour), M (minute), or S (second) number-letter pairs are present, then the separator `T` must precede the first ‘time’ number-letter pair. For complete details, see ISO 8601 Data elements and interchange formats — Information interchange — Representation of dates and times.

Appendix A.4.4 teidata.duration.w3c

teidata.duration.w3c defines the range of attribute values available for representation of a duration in time using W3C datatypes.
Module	tei — Formal specification
Used by
Content model	<content> <dataRef name="duration"/> </content> ⚓
Declaration	tei_teidata.duration.w3c = xsd:duration⚓
Example	<time dur="PT45M">forty-five minutes</time>
Example	<date dur="P1DT12H">a day and a half</date>
Example	<date dur="P7D">a week</date>
Example	<time dur="PT0.02S">20 ms</time>
Note	A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the `S` number, which may have a decimal component (using `.` as the decimal point). If any number is 0, then that number-letter pair may be omitted. If any of the H (hour), M (minute), or S (second) number-letter pairs are present, then the separator `T` must precede the first ‘time’ number-letter pair. For complete details, see the W3C specification.

Appendix A.4.5 teidata.enumerated

teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities.
Module	tei — Formal specification
Used by	Element: measure/@type num/@type phr/@function recording/@type
Content model	<content> <dataRef key="teidata.word"/> </content> ⚓
Declaration	tei_teidata.enumerated = teidata.word ⚓
Note	Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace. Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element.

Appendix A.4.6 teidata.language

teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system. [6.1. Language Identification]
Module	tei — Formal specification
Used by	Element: TEI/@xml:lang language/@ident teiCorpus/@xml:lang
Content model	<content> <alternate> <dataRef name="language"/> <valList> <valItem ident=""/> </valList> </alternate> </content> ⚓
Declaration	tei_teidata.language = xsd:language \| ( "" )⚓
Note	The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice. A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable. language The IANA-registered code for the language. This is almost always the same as the ISO 639 2-letter language code if there is one. The list of available registered language subtags can be found at https://www.iana.org/assignments/language-subtag-registry. It is recommended that this code be written in lower case. script The ISO 15924 code for the script. These codes consist of 4 letters, and it is recommended they be written with an initial capital, the other three letters in lower case. The canonical list of codes is maintained by the Unicode Consortium, and is available at https://unicode.org/iso15924/iso15924-codes.html. The IETF recommends this code be omitted unless it is necessary to make a distinction you need. region Either an ISO 3166 country code or a UN M.49 region code that is registered with IANA (not all such codes are registered, e.g. UN codes for economic groupings or codes for countries for which there is already an ISO 3166 2-letter code are not registered). The former consist of 2 letters, and it is recommended they be written in upper case; the list of codes can be searched or browsed at https://www.iso.org/obp/ui/#search/code/. The latter consist of 3 digits; the list of codes can be found at http://unstats.un.org/unsd/methods/m49/m49.htm. variant An IANA-registered variation. These codes ‘are used to indicate additional, well-recognized variations that define a language or its dialects that are not covered by other available subtags’. extension An extension has the format of a single letter followed by a hyphen followed by additional subtags. There are currently only two extensions in use. Extension `T` indicates that the content was transformed. For example en-t-it could be used for content in English that was translated from Italian. Extension T is described in the informational RFC 6497. Extension `U` can be used to embed a variety of locale attributes. It is described in the informational RFC 6067. private use An extension that uses the initial subtag of the single letter x (i.e., starts with `x-`) has no meaning except as negotiated among the parties involved. These should be used with great care, since they interfere with the interoperability that use of RFC 4646 is intended to promote. In order for a document that makes use of these subtags to be TEI-conformant, a corresponding <language> element must be present in the TEI header. There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications. Second, an entire language tag can consist of only a private use subtag. These tags start with `x-`, and do not need to follow any further rules established by the IETF and endorsed by these Guidelines. Like all language tags that make use of private use subtags, the language in question must be documented in a corresponding <language> element in the TEI header. Examples include sn Shona zh-TW Taiwanese zh-Hant-HK Chinese written in traditional script as used in Hong Kong en-SL English as spoken in Sierra Leone pl Polish es-MX Spanish as spoken in Mexico es-419 Spanish as spoken in Latin America The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML.

Appendix A.4.7 teidata.name

teidata.name defines the range of attribute values expressed as an XML Name.
Module	tei — Formal specification
Used by	Element: application/@ident tagUsage/@gi
Content model	<content> <dataRef name="Name"/> </content> ⚓
Declaration	tei_teidata.name = xsd:Name⚓
Note	Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see https://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits.

Appendix A.4.8 teidata.numeric

teidata.numeric defines the range of attribute values used for numeric values.
Module	tei — Formal specification
Used by	Element: measure/@quantity
Content model	<content> <alternate> <dataRef name="double"/> <dataRef name="token" restriction="(\-?[\d]+/\-?[\d]+)"/> <dataRef name="decimal"/> </alternate> </content> ⚓
Declaration	tei_teidata.numeric = xsd:double \| token { pattern = "(\-?[\d]+/\-?[\d]+)" } \| xsd:decimal⚓
Note	Any numeric value, represented as a decimal number, in floating point format, or as a ratio. To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3. A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2.

Appendix A.4.9 teidata.outputMeasurement

teidata.outputMeasurement defines a range of values for use in specifying the size of an object that is intended for display.
Module	tei — Formal specification
Used by
Content model	<content> <dataRef name="token" restriction="[\-+]?\d+(\.\d+)?(%\|cm\|mm\|in\|pt\|pc\|px\|em\|ex\|ch\|rem\|vw\|vh\|vmin\|vmax)"/> </content> ⚓
Declaration	tei_teidata.outputMeasurement = token { pattern = "[\-+]?\d+(\.\d+)?(%\|cm\|mm\|in\|pt\|pc\|px\|em\|ex\|ch\|rem\|vw\|vh\|vmin\|vmax)" }⚓
Example	<figure> <head>The TEI Logo</head> <figDesc>Stylized yellow angle brackets with the letters <mentioned>TEI</mentioned> in between and <mentioned>text encoding initiative</mentioned> underneath, all on a white background.</figDesc> <graphic height="600px" width="600px" url="http://www.tei-c.org/logos/TEI-600.jpg"/> </figure>
Note	These values map directly onto the values used by XSL-FO and CSS. For definitions of the units see those specifications; at the time of this writing the most complete list is in the CSS3 working draft.

Appendix A.4.10 teidata.pattern

teidata.pattern defines attribute values which are expressed as a regular expression.
Module	tei — Formal specification
Used by	Element: prefixDef/@matchPattern
Content model	<content> <dataRef name="token"/> </content> ⚓
Declaration	tei_teidata.pattern = token⚓
Note	A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern `H(ä\|ae?)ndel` (or alternatively, it is said that the pattern `H(ä\|ae?)ndel` matches each of the three strings) Wikipedia This TEI datatype is mapped to the XSD token datatype, and may therefore contain any string of characters. However, it is recommended that the value used conform to the particular flavour of regular expression syntax supported by XSD Schema.

Appendix A.4.11 teidata.pointer

teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere.
Module	tei — Formal specification
Used by	Element: TEI/@ana catRef/@target catRef/@scheme category/@ana include/@href link/@ana link/@target phr/@ana ref/@target relation/@active relation/@mutual relation/@passive state/@ana
Content model	<content> <dataRef restriction="\S+" name="anyURI"/> </content> ⚓
Declaration	tei_teidata.pointer = xsd:anyURI { pattern = "\S+" }⚓
Note	The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, `https://secure.wikimedia.org/wikipedia/en/wiki/%` is encoded as `https://secure.wikimedia.org/wikipedia/en/wiki/%25` while `http://موقع.وزارة-الاتصالات.مصر/` is encoded as `http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/`

Appendix A.4.12 teidata.prefix

teidata.prefix defines a range of values that may function as a URI scheme name.
Module	tei — Formal specification
Used by	Element: prefixDef/@ident
Content model	<content> <dataRef name="token" restriction="[a-z][a-z0-9\+\.\-]*"/> </content> ⚓
Declaration	tei_teidata.prefix = token { pattern = "[a-z][a-z0-9\+\.\-]*" }⚓
Note	This datatype is used to constrain a string of characters to one that can be used as a URI scheme name according to RFC 3986, section 3.1. Thus only the 26 lowercase letters a–z, the 10 digits 0–9, the plus sign, the period, and the hyphen are permitted, and the value must start with a letter.

Appendix A.4.13 teidata.probCert

teidata.probCert defines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value.
Module	tei — Formal specification
Used by
Content model	<content> <alternate> <dataRef key="teidata.probability"/> <dataRef key="teidata.certainty"/> </alternate> </content> ⚓
Declaration	tei_teidata.probCert = teidata.probability \| teidata.certainty ⚓

Appendix A.4.14 teidata.probability

teidata.probability defines the range of attribute values expressing a probability.
Module	tei — Formal specification
Used by	teidata.probCert
Content model	<content> <dataRef name="double"> <dataFacet name="minInclusive" value="0"/> <dataFacet name="maxInclusive" value="1"/> </dataRef> </content> ⚓
Declaration	tei_teidata.probability = xsd:double⚓
Note	Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true.

Appendix A.4.15 teidata.replacement

teidata.replacement defines attribute values which contain a replacement template.
Module	tei — Formal specification
Used by	Element: prefixDef/@replacementPattern
Content model	<content> <textNode/> </content> ⚓
Declaration	tei_teidata.replacement = text⚓

Appendix A.4.16 teidata.temporal.iso

teidata.temporal.iso defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the international standard Data elements and interchange formats – Information interchange – Representation of dates and times.
Module	tei — Formal specification
Used by
Content model	<content> <alternate> <dataRef name="date"/> <dataRef name="gYear"/> <dataRef name="gMonth"/> <dataRef name="gDay"/> <dataRef name="gYearMonth"/> <dataRef name="gMonthDay"/> <dataRef name="time"/> <dataRef name="dateTime"/> <dataRef name="token" restriction="[0-9.,DHMPRSTWYZ/:+\-]+"/> </alternate> </content> ⚓
Declaration	tei_teidata.temporal.iso = xsd:date \| xsd:gYear \| xsd:gMonth \| xsd:gDay \| xsd:gYearMonth \| xsd:gMonthDay \| xsd:time \| xsd:dateTime \| token { pattern = "[0-9.,DHMPRSTWYZ/:+\-]+" }⚓
Note	If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used. For all representations for which ISO 8601:2004 describes both a basic and an extended format, these Guidelines recommend use of the extended format.

Appendix A.4.17 teidata.temporal.w3c

teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification.
Module	tei — Formal specification
Used by	Element: birth/@when death/@when orgName/@from orgName/@to
Content model	<content> <alternate> <dataRef name="date"/> <dataRef name="gYear"/> <dataRef name="gMonth"/> <dataRef name="gDay"/> <dataRef name="gYearMonth"/> <dataRef name="gMonthDay"/> <dataRef name="time"/> <dataRef name="dateTime"/> </alternate> </content> ⚓
Declaration	tei_teidata.temporal.w3c = xsd:date \| xsd:gYear \| xsd:gMonth \| xsd:gDay \| xsd:gYearMonth \| xsd:gMonthDay \| xsd:time \| xsd:dateTime⚓
Note	If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.

Appendix A.4.18 teidata.text

teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace.
Module	tei — Formal specification
Used by	Element: pc/@msd
Content model	<content> <dataRef name="string"/> </content> ⚓
Declaration	tei_teidata.text = string⚓
Note	Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted.

Appendix A.4.19 teidata.truthValue

teidata.truthValue defines the range of attribute values used to express a truth value.
Module	tei — Formal specification
Used by
Content model	<content> <dataRef name="boolean"/> </content> ⚓
Declaration	tei_teidata.truthValue = xsd:boolean⚓
Note	The possible values of this datatype are 1 or true, or 0 or false. This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: teidata.xTruthValue.

Appendix A.4.20 teidata.versionNumber

teidata.versionNumber defines the range of attribute values used for version numbers.
Module	tei — Formal specification
Used by	Element: application/@version
Content model	<content> <dataRef name="token" restriction="[\d]+[a-z][\d](\.[\d]+[a-z][\d]){0,3}"/> </content> ⚓
Declaration	tei_teidata.versionNumber = token { pattern = "[\d]+[a-z][\d](\.[\d]+[a-z][\d]){0,3}" }⚓

Appendix A.4.21 teidata.word

teidata.word defines the range of attribute values expressed as a single word or token.
Module	tei — Formal specification
Used by	teidata.enumeratedElement: media/@mimeType
Content model	<content> <dataRef name="token" restriction="[^\p{C}\p{Z}]+"/> </content> ⚓
Declaration	tei_teidata.word = token { pattern = "[^\p{C}\p{Z}]+" }⚓
Note	Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Appendix A.4.22 teidata.xTruthValue

teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown.
Module	tei — Formal specification
Used by
Content model	<content> <alternate> <dataRef name="boolean"/> <valList> <valItem ident="unknown"/> <valItem ident="inapplicable"/> </valList> </alternate> </content> ⚓
Declaration	tei_teidata.xTruthValue = xsd:boolean \| ( "unknown" \| "inapplicable" )⚓
Note	In cases where where uncertainty is inappropriate, use the datatype teidata.TruthValue.

Appendix A.4.23 teidata.xpath

teidata.xpath defines attribute values which contain an XPath expression.
Module	tei — Formal specification
Used by
Content model	<content> <textNode/> </content> ⚓
Declaration	tei_teidata.xpath = text⚓
Note	Any XPath expression using the syntax defined in 6.2.. When writing programs that evaluate XPath expressions, programmers should be mindful of the possibility of malicious code injection attacks. For further information about XPath injection attacks, see the article at OWASP.

Table of contents

2.1. XML structure

2.2. Use of XInclude

2.3. File names and directory structure

3.1. Characters

3.2. Standard values

3.3. Attributes of top-level elements

3.4. Pointing attributes

3.5. Temporal attributes

4.1. File description

4.1.1. Title statement

4.1.2. Edition statement

4.1.3. Extents

4.1.4. Publication statement

4.1.5. Source description

4.2. Encoding description

4.2.1. Project description

4.2.2. Editorial declaration

4.2.3. Tags declaration

4.2.4. Class declaration and taxonomies

4.3. Profile description

4.3.1. Setting description

4.3.2. Text class

4.3.3. Participant description

4.3.4. Language usage

4.4. Revision description

5.1. Speakers

5.1.1. Speaker affiliations

5.2. Organisations

5.2.1. The government organisation

5.2.2. The parliament organisations

5.2.3. Political parties and parliamentary groups

5.2.3.1. Encoding political orientation

5.2.3.2. Encoding CHES variables

5.2.3.3. Relations between organisations

6.1. Divisions

6.2. Utterances

6.3. Transcriber comments

6.3.1. Notes

6.3.2. Incidents

6.4. Gaps

6.5. Interrupted utterances

7.1. Linguistic markup

7.1.1. Word-level annotation

7.1.2. Syntactic words

7.1.3. Named entities

7.1.4. Syntactic parses

7.1.5. Semantic annotation

7.2. Metadata for linguistic annotation

7.2.1. Application information for linguistic processing

7.2.2. Linguistic taxonomies

7.2.3. Prefix definitions

9.1. Validating ParlaMint corpora

9.2. Finalisation of corpora

9.3. Conversions

Appendix A.1 Elements

Appendix A.1.1 <TEI>

Appendix A.1.2 <addName>

Appendix A.1.3 <affiliation>

Appendix A.1.4 <appInfo>

Appendix A.1.5 <application>

Appendix A.1.6 <availability>

Appendix A.1.7 <bibl>

Appendix A.1.8 <birth>

Appendix A.1.9 <body>

Appendix A.1.10 <catDesc>

Appendix A.1.11 <catRef>

Appendix A.1.12 <category>

Appendix A.1.13 <change>

Appendix A.1.14 <classDecl>

Appendix A.1.15 <correction>

Appendix A.1.16 <date>

Appendix A.1.17 <death>

Appendix A.1.18 <desc>

Appendix A.1.19 <div>

Appendix A.1.20 <edition>

Appendix A.1.21 <editionStmt>

Appendix A.1.22 <editorialDecl>

Appendix A.1.23 <education>

Appendix A.1.24 <email>