This is a preliminary set of guidelines for using TEI to encode textual variation for Indic texts. The TEI recommendations on textual criticism are here. In addition, there is a cheatsheet for TEI critical apparatuses written by Marjorie Burghart. If you are new to TEI encoding, we recommend that you take a few minutes to read Getting Started with TEI, and then Marjorie Burghart's cheatsheet for a quick familiarization with encoding textual variation in particular. The motivation for this document is that the general TEI guidelines are not specific enough to guarantee consistent encoding within a project, and other sets of guidelines are too specialized.
The immediate goal of this document is to inventory the basic types of text-critical markup that will be included in SARIT texts. SARIT is a project that collects TEI versions of Sanskrit and Prakrit texts, promotes standards for their encoding, and develops applications for their use. At present, most texts included in SARIT have very minimal text-critical markup, or none at all. But we would like to include text-critical markup in future phases of the project and add related functionality to the website.
This document will therefore allow us
to render this markup correctly and consistently in any SARIT-related applications;
to design applications that produce and interact with this kind of markup on behalf of the end-user, and especially an online editor for TEI-based texts; and
to document and promote standards for text-critical markup in the field of Indology.
Encoding an existing critical edition in TEI and producing a new critical edition from scratch are very different tasks. This document is primarily concerned with encoding editions that already exist in print. This means, inter alia, that we produce digital facsimiles of print editions, which may differ radically from the editions that we would produce if we were starting from scratch. Although these texts may be improved subsequently - indeed such digital editions are intended as a starting point for future editorial work - we want to ensure that the digital edition presents all of the information that is presented in the printed edition.
This document provides guidelines for an analytic apparatus, in which the information of the textual apparatus is highly structured, using the text-critical module of the TEI guidelines. This is not the only way to encode textual variation. In many editions, text-critical information is presented in unstructured footnotes or endnotes. There are benefits to such unstructured apparatus entries: people can easily read them and understand them. However, there are also benefits to using a structured and analytic apparatus. For example, the analytic approach permits the reconstitution of entire witnesses, and also allows the readings of particular witnesses (or editors) to be aggregated and systematically compared.
Please contribute ideas and suggestions based on your experience, and please identify yourself in any contributions to this document. Thank you!
Text-critical markup will generally take the form of entries in a critical apparatus. Most of the traditional means of representing textual variation, such as the “critical apparatus” itself at the bottom of a printed page, relate to variation at the level of the letter, word, and phrase. Higher-level variation (such as the addition, omission or rearrangement of whole chapters or sections) is usually discussed in text-critical notes, which may be printed with or in the critical apparatus. Typically, variation at the level of the letter, word and phrase will be encoded in a structured apparatus entry (<app> with one or more <rdg> elements), and variation at or above the level of a structural unit (a verse or prose paragraph) will be signalled in a text-critical note (<app> with a <note> element).
An apparatus entry consists of two obligatory parts: the reading, and the authority on which the reading is based. In this context, the lemma is another word for the reading of the base text, which may or may not be repeated explicitly in the apparatus entry, depending on the encoding strategy used.
The reading, as noted above, should generally be no longer than a few words. The reading must not be shorter than a syllable. The reason for this restriction is that we want to avoid TEI elements that begin with combining letters in Indic scripts ( े, ो, ा, ृ, etc.). In cases where the variation is “really” at the level of the diacritical mark, or mātrā, we recommend encoding the rest of the syllable anyway (e.g., when there is variation between ke and ko, encode ke or ko and not k+e or o).
One of the two main parts of the apparatus entry is the authority on which the reading is based. If we confine ourselves to digital transcriptions of printed editions, the authority might be one of the following:
a manuscript (in the cast of most variant readings );
a modern edition of the work, or some other piece of modern scholarship (in the case of emendations);
a parallel text (e.g., a quotation of the text in another work, in the case of restorations);
the editor or reviewer of the digital text himself/herself (in the case of corrections).
Generally we refer to manuscript readings in the apparatus by a siglum that uniquely identifies the manuscript (e.g., “म”). For modern editions, the most common practice is to also use a siglum (e.g., “KM”). For books and articles, one common practice is to use the scholar's name in the apparatus (e.g., “Kangle”). For parallel texts (including testimonia), apparatus entries may refer to simply the author of the parallel text (e.g., “Hemacandra”), its title (e.g., “Kāvyānuśāsana” or “KA”), or they might employ a more specific citation (“KA p. 200 l. 5”). Some editions include a separate apparatus for parallel texts.
We want all of the references in the critical apparatus to be clear and unambiguous, and we want to make it easy for readers to look up the full bibliographic entry that the reference corresponds to. For this reason, all authorities referenced in the apparatus must be defined elsewhere in the document, or alternatively in an external authority file. At the moment, we suggest referring to these authorities by xml:id. That is, if you want to report the readings of a manuscript “K,” then your apparatus entry must refer to an XML element identified by the ID “K”, as follows:
In the text:
In the header :
In this system, every reference to an “authority” in the apparatus should be interpreted as a reference to an element that defines the authority and tells readers where and how it might be consulted. Those elements will be contained in the source description (<sourceDesc>); see the appendix, Describing Witnesses , for more information.
Most of the time, a reference to the authority itself - a manuscript, say, or a printed edition - is sufficient; sometimes, more information, such as page, folio, or line numbers, will have to be added. In those cases, the additional information should be supplied within a <ref> element inside the apparatus entry:
This entry refers to a “source” K, which is a printed edition of a text rather than a manuscript, and to the precise location in K where the quoted reading is found. See restorations below.
Printed texts almost always have an external apparatus, which is printed either at the bottom of the page or in an appendix. SARIT recommends the use of an internal apparatus, in which the apparatus entries are embedded directly into the text. Adding TEI encoding to a text will therefore mean converting an external apparatus to an internal apparatus.
The TEI P5 guidelines distinguish three strategies for linking the critical apparatus to the text:
the location-referenced method, in which the apparatus entry is embedded in the text at a certain point;
the double-end-point-attached method, which embeds the apparatus entry in the text at a certain point but also refers to another point of reference where the “lemma” of the apparatus begins or ends.
the parallel-segmentation method, where the “lemma” is placed inside the apparatus entry.
Because we will generally be converting footnotes into apparatus entries, the location-referenced method will generally be most appropriate. For some projects, the parallel-segmentation method might be desirable and feasible, because it encodes all of the relevant information into the apparatus entry itself. However, it is often difficult to determine the lemma to which a reading given in the apparatus of a printed texts corresponds (and impossible to isolate the lemma programmatically); moreover, the location-referenced and double-end-point attached method allow for the possibility of conflicting apparatus entries, and the parallel-segmentation method generally does not. Examples will be given in the following cases.
This is perhaps the most common situation. The apparatus (or a text-critical footnote) refers to the reading of another witness. In such a case, the text might read:
न2 तदेकमुखप्रेक्षितामतिवर्तते ।
And the apparatus or footnote reads:
2. म.भ. नटभेदमुखप्रेक्षकमति ।
The apparatus entry is embedded in the text, and refers to the witnesses by their xml:id, which will have been assigned already in the source description (see below ):
This example uses the location-referenced method. The parallel-segmentation method would be:
Since projects often need to convert hundreds or thousands of apparatus entries, this method is generally not feasible, since it requires a human editor to determine the beginning and ending of the apparatus entry.
In both cases, we are using an <app> element to contain the entire apparatus entry, and each distinct reading within the apparatus entry is given by the <rdg> element. The witness of the reading is referred to in the value of the wit attribute, if it is a manuscript; if the reading is based on a printed text, that text is referred to in the value of the source attribute. The <lem> element is required in the parallel-segmentation method, but optional in the location-referenced method; its primary purpose in the location-referenced method is to supply information (such as witnesses or responsibility) about the reading of the base text.
Here is another example. The text and apparatus read:
1. B अवराहो वि.
The encoding would be (again using the location-referenced method):
Now suppose we have two different readings. The text and apparatus read:
7. अ. चापि । क्ष.म. चाभिधत्स्व नः
Here we have to use the location-referenced method, because the scope of the given readings is different:
Note in this case that the different witnesses to the same reading, क्ष and म, are referred to in the same wit attribute of the same <rdg> element, since they have an identical reading.
If two scholars read the same witness (say, a manuscript) differently, we may use the source attribute on the <rdg> element together with the wit element. The value of source, although typically the name of the scholar in apparatus entries, should point to the bibliographic citation in which the scholar presented the reading (usually an edition).
This is exactly parallel to the case of reporting the reading of a manuscript. The only difference is that we refer to a printed text rather than to a manuscript. And to refer to printed texts, we use the attribute source rather than wit. Hence for a text and apparatus that read:
2. Laber विज्जुप्फुरिओ.
Suppose we have in the text and apparatus:
4. म.भ. अयं भागो नास्ति ।
In this case, we have the option of either reproducing the exact wording of the editor of the printed text by placing it in a text-critical note, as discussed below, or “translating” his short comment in Sanskrit into TEI. We generally recommend the latter strategy, when the comment is unambiguous. Here, the editor is clearly referring to an omission, and we encode omissions with an empty <rdg> element:
Note that when we use the location-referenced method, an empty <lem> element is not interpreted in the same way as an empty <rdg> element. The latter means that the witness referred to lacks the reading of the base text; the former simply gives us additional information about the reading of the base text (its witnesses, sources, etc.). If we want to represent the fact that one or more witnesses include material that is not found in the base text, we do not present it as an “omission” in the base text but as an “interpolation” in the witnesses (see below).
Lacunae are not omissions, but places in which the manuscript (or its exemplar) has been damaged. Encoding lacunae using the location-referenced method of apparatus attachment is difficult. If the exact locations of the lacunae are known, the beginnings and ends can be marked with a <lacunaStart> and <lacunaEnd> element, which must be contained within a <lem> or <rdg> element. Further details might be provided in a <note> element contained within the <lem> or <rdg> element. Here is one example that uses the type attribute to show that the given readings represent lacunae:
[तव]8 लोलं मनः
8 A: ••• (lacuna marked by 3 points); B: (lacuna marked by a blank of two spaces); C: ••••• (lacuna marked by 5 points): तव supplied from D.
According to the location-referenced method:
According to the parallel-segmentation method:
An interpolation, in this context, is a reading that is found in one or more manuscript witnesses but has been excluded from the base text because the editors have judged it to be not original.
Examples - the Raghupañjikā edition?
Emendations are traditionally referred to the name of the scholar who proposed the reading. However, it is generally more helpful to readers to refer to a specific bibliographic item (a book or article) in which the scholar has proposed the reading. Thus we refer to emendations with the source attribute, and we make the value of this attribute point towards a bibliographic citation (in cases where a bibliographic citation is not available, we must refer to a <name> entity somewhere in the header). Note that we use source rather than wit, because the scholars are not relying on any particular “witnesses” (the case of restorations from parallel texts is different and will be discussed below ). Hence the following text and apparatus entry
3. उभयोऽपि - De
would be encoded as follows:
1. ते - De
would be encoded as follows:
This example shows how the readings reported in the editions often do not make sense without further information, and hence require us to use the location-referenced method rather than the parallel-segmentation method.
Another example includes some additional information about the lemma as well as a reading ascribed to “RK,” or Ramakrishna Kavi. We are not told whether this is Kavi's reading of the manuscript, or his emendation , but he will appear in the source attribute in any case. We are also not told where Ramakrishna Kavi proposed this reading, and thus what bibliographic citation #RK should point to, but this questions need to be settled on a project-by-project basis.
2. H.Ms.; भवत्यनुपचितः RK.
The encoding would be:
In this case, the attribution of the lemma to the Hoshiarpur manuscript is given through the empty element <lem> with the attribute wit set to #HMs. This simply provides information about the reading of the base text. We do not, however, know exactly where the reading of the Hoshiarpur manuscript begins and ends.
Sometimes the exact nature of the scholar's responsibility is not clear from the editor's note or the apparatus entry. We should generally try to translate these into TEI-compliant forms. Hence in the following example, where the reading of the base text is “supported” by De but no alternative readings are given, we make De a “source” for the lemma:
8. supported by De.
One more example involves manuscript readings and emendations reported both in the text and in the apparatus:
चक्कस(ना)8मवहू का(क)9न्तसहाइणिआ ।
8. Kavi - सना(णा) 9. क-not read by Kavi;
This is an unfortunately typical example. We are given to know in an earlier note that corrections to the text, as well as the footnotes, are the responsibility of V.M. Kulkarni, the editor (of the second edition of this part of the Nāṭyaśāstra ). We must figure out that the readings that Kulkarni regards as correct are those in the parenthesis, and the incorrect readings of the manuscript(s) are outside of the parentheses; we must also be aware that parentheses are also used for Kulkarni's additions to the text (readings not found in M. Ramakrishna Kavi's earlier edition), as well as Kavi's additions to the text. The apparatus entry might look like this, assuming we simply want to encode, and not correct, Kulkarni's text:
The first apparatus entry shows that Kavi added णा after स and Kulkarni changed it to ना (incorrectly); the second shows that Kavi failed to read the का of the manuscript(s), which Kulkarni has emended to क. None of the manuscripts are actually referred to in the apparatus entry, so we have to use the dummy-referent #mss. In the parallel-segmentation method this line would read:
The editor or reviewer will sometimes want to communicate that the witness reads X, where X is clearly a mistake or corruption for Y. In such cases, the editor's reading is not an emendation (especially when the editor does not accept the corrected reading as original), but simply a correction. For such cases we use the <choice> element, which has two children: <sic> reporting the reading as given, and <corr> reporting the corrected reading. <corr> should have a resp (responsibility) attribute that identifies the person who made the correction. Here is an example where the correction occurs in a reading that the editor does not accept:
सुयणा वि दुज्जणा इह विणिम्मिया भुयणे1
1. B सुयणा वि णिम्मिआ दुज्जणाइ इह सुवणे [ = दुज्जणा इह भुवणे ]
This would be encoded as:
The term “parallel texts” includes anything that is not in the direct manuscript tradition of the text under consideration. Parallel texts might be:
quotations of the text in a later text, either anonymously or attributed (e.g., a verse from Dharmakīrti's Pramāṇavārttika in Śālikanāthamiśra's Prakaraṇapañcikā );
anthologizations of the text (e.g., a verse from Rājaśekhara's Bālarāmāyaṇa in the Saduktikarṇāmṛta )
reworkings or adaptations of the text (e.g., Hemacandra's adaptation of Abhinavagupta's discussion of rasa in his Kāvyānuśāsana ).
In these cases, the reading is referred to another text, and preferably with a pointer to a location in that text (either a page number or a canonical reference such as chapter and verse). The parallel text thus serves as a source , and should be referred to in the source attribute. (In some cases, the parallel text is read from a manuscript, in which case the wit attribute should be used.) References to locations within the parallel text should be encoded using a <ref> element inside of the reading element.
Hence the apparatus entry:
6. याता for गता KA., p. 89.
should be encoded as follows:
In this case, there is also the possibility of using the parallel-segmentation method:
Here is another example, where the parallel text was a source of the reading of the base text (lemma):
The person who is responsible for identifying the parallel might also be referred to in the resp attribute. Thus, if we want to say that V. M. Kulkarni (VMK) found a parallel passage in the Kāvyakalpalatāviveka (KLV), we might make #VMK point to the bibliographic entry for Kulkarni's article, and #KLV to the bibliographic entry for the edition of the KLV, and encode as follows:
Sometimes a parallel text is adduced in favor of a reading, although the parallel text itself is not a “witness” to the reading. Here is one example:
अनुकार्येऽनुकर्तर्यपि चानु5सन्धानबलात् - इति ।
5. विचारा - De; but रामादिरूपता Hemacandro,KA,p. 89
In this case, Hemacandra's Kāvyānuśāsana has a passage that corroborates the reading अनुसन्धान॰ rather than De's विचारा॰ (although it is difficult to see this, and indeed difficult to see exactly what De's reading corresponds to in the edition). This information should be put into a <note> element that qualifies the reading proposed by De. When we convert this entry to TEI, and correct some of the typographic mistakes, we get:
An editor might refer to the reading of a commentary - either the reading that the commentator accepts , or a reading that he mentions . [No examples of an apparatus entry that features such readings.]
It will often make sense to classify the readings (whether they are manuscript readings, conjectures, emendations, etc.) based on the kind of variation that they index. For example, there may be a group of variants that are orthographic in nature: one group of manuscripts consistently reads अ where another group consistently reads य. Depending on the needs of the project, this variation can be “typed” as orthographic with an attribute of the <app> element.
[SARIT will probably need a constrained list of apparatus “types” if this attribute is to be of any use.]
Some editions provide references in the body of the text, for example in parenthesis, and other editions provide references in footnotes that are found together with variant readings at the bottom of the page. Since these references are not strictly speaking apparatus entries, we do not use the TEI element <app> for them. Instead, we encode them as notes, with the <note> element. Further recommendations are beyond the scope of this document.
Text-critical notes such as are found in the critical apparatus of a printed text should also be encoded as apparatus entries. A <note> element within an <app> element will be interpreted as such a text-critical note.
The language of the apparatus may be the same as or different from the language of the encoded text. In the latter case, it is necessary to specify in the TEI Header that all text-critical notes (app/note) should have a certain xml:lang attribute: en for English, sa-Deva for Sanskrit in the Devanāgarī script, etc. (See the SARIT Guidelines for a discussion of language attributes.) Sometimes, for example when a text has been edited several times, there might be text-critical notes in two languages; in such cases, one will be the “default” language defined in the TEI Header, and the other must be noted with an xml:lang attribute on each <note> element. Take care, however, to enclose any Sanskrit text or readings in the <note> element within an element with the appropriate xml:lang attribute. (The default TEI element for foreign text is <foreign>).
The <note> element may can contain any of the elements allowed in the TEI P5 DTD. Typically, notes found in the apparatus will not have paragraph-level formatting, so they should contain text rather than elements such as <p>. Here is an example:
1 सदाचारं प्रमाणयता ...
1 The following verse is found in the beginning of the Avaloka only in the Adyar Telugu Manuscript:
vācakaḥ praṇavo yasya krīḍāvastv akhilaṃ jagat |
śrutir ājñā, vapur jñānaṃ, taṃ vande devakīsutam ||
F. Hall noted in his Preface (p. 5, fn.) to his edition of the Daśarūpaka that in one of the manuscripts of the Avaloka the following verse, which is different from the one now seen, was found:
praṇipatya śivaṃ sāmbam ācāryaṃ bharataṃ tathā |
kriyate daśarūpasya vyākhyānaṃ dhanikena vai ||
This was however rejected by F. Hall as spurious.
I would encode this note as follows (the xml:lang attribute of the <note> element may be omitted if it is specified elsewhere):
By “witnesses” we mean any source that attests to a given reading supplied in the apparatus. Thus “witnesses” includes manuscripts as well as printed editions, scholarly books and articles, and so on. All of these witnesses should be described in the source description (<sourceDesc>) in the TEI Header, but manuscript sources and printed sources will generally be described differently. Each witness must be assigned a unique xml:id by which it is referred in the apparatus entries.
Manuscripts may be defined and referred to in a number of ways according to the TEI guidelines. We recommend the use of the <listWit> element within the source description, for several reasons: this can include <witness> elements that refer to manuscripts as well as groups of manuscripts, whereas the alternatives (e.g., defining manuscripts in <listBibl> with <msDesc>) do not allow for the possibility of defining a group of manuscripts. The children of the <listWit> element might be manuscripts (defined either through <msDesc> or <witness>) or groups of manuscripts (defined through <witness>). In any case, the xml:id of the manuscript(s) should serve as the siglum by which the witness is referred to in the apparatus. Here is an example:
The TEI guidelines recommend that manuscripts defined in the list of witnesses be supplied with a pointer to a complete description of the manuscript elsewhere, either within the same project (using the <msDesc> element) or in an external resource.
Printed sources should be defined in a <listBibl> element in the source description. For the content and structure of the <listBibl> element, please consult the TEI P5 Guidelines on this topic. Additionally, we recommend the type attribute of <bibl> or <biblStruct>, with the values of article, book, incollection, phdthesis, etc., so the TEI bibliography entry can easily be converted into other formats. Here are a few examples:
Normally, the “people” referred to in the critical apparatus are scholars who have published their readings, corrections, emendations etc. in a book or article. And thus these “people” are generally treated as “sources” in the apparatus, and their names should point to a bibliographic entry as exemplified above.
The “default” place for such information is a list of persons, which is contained in the <particDesc> (the description of participants), which is contained in the <profileDesc> (profile description) of the TEI Header. For instance, a reading which the editor attributes to S. K. De but which is not presented in any publication (since it was communicated to the editor personally, for example) should refer to an element with the xml:id De, which would be defined in the Header as follows:
Discussions on TEI-L:
Chapter 12 of the TEI P5 Guidelines:
Marjorie Burghart's Cheat Sheet:
SARIT Encoding Guidelines: