Up Folder INDOLOGY> SARIT

Newsflash

May 2013. SARIT receives major funding from the NEH and the DFG as a joint project of Columbia and Heidelberg Universities. Read more...

Quick start

Click Basic search to search now. The Search term help gives help about the syntax of search terms.

Overview

Welcome to the SARIT website. Here you will find electronic editions of Sanskrit and other Indian-language texts. These are documented, dated and have embedded notes about their change history, so that they can be publicly cited and used with confidence as scholarly sources. The editions in the SARIT library currently include these works.

This website also currently offers tools for text search, retrieval and analysis of the works in the SARIT library. You can search for a single word, phrase, words that occur in the same paragraph, and so forth. You can generate an index of terms, a KWIC index, and word-frequency lists.

You can download all the texts at SARIT. They are licensed under a Creative Commons license. Once downloaded, you can use services such as OxGarage to convert the files to a format that is useful to you, for example PDF, HTML, or for reading on an ebook like the Kindle.

History

The Thesaurus Linguae Graecae was founded in 1972. Its subsequent growth as a sophisticated research tool, and the projects derived from it, such as PERSEUS, demonstrated compellingly that a corpus of classical texts in machine-readable format could be vitally important for the study of language, art, history and culture. The history of SARIT looks back to a meeting convened by Prof. Richard Lariviere at the University of Texas in 1988 that brought together many specialists interested in creating a "TLG for Indic," i.e., a corpus of machine-readable texts in Indian languages. The minutes of that meeting still make interesting reading today, and lay out some of the rationales underlying the present project. After the Texas meeting, the INDOLOGY website began to make electronic texts of Sanskrit works available in an ad-hoc manner. This effort was highly successful, but there remained problems of format normalization, non-standard fonts and character encodings. Many of these problems were insoluble at the time. Things have moved forward since 1988, especially in corpus linguistics, computer networking, and encoding standards. SARIT takes advantage of all these new developments, especially Unicode, XML and the Text Encoding Initiative.

What makes SARIT different?

The main difference between this collection of Sanskrit and other Indic-language texts and others is that these texts are marked up ("tagged") using the rich Text Encoding Initiative ("TEI") system. TEI is a way of tagging files that was designed by humanities scholars for humanities scholars. It is capabable of building a great deal of intelligence into the file itself, so that a TEI-file can later serve many purposes and remains flexible and useful in the long term. TEI files contain their own history, so you can always find out where the file came from, how and when it has been updated, and how it relates to printed editions, manuscripts or other sources. For a gentle introduction to the TEI, see the excellent TEI by Example website. The files in this language corpus can be considered as electronic editions, and can be cited as such in scholarly research publications. SARIT texts respect copyright and are licensed appropriately to allow free academic use. Furthermore, SARIT makes its TEI editions of Indian texts freely available to all.

Electronic texts are not static: we edit them, improve them, alter them, and - crucially - correct errors in them. SARIT solves the problem of how to manage the natural evolution of electronic texts, documenting changes and allowing for the divergence and comparison of versions, merging of versions and rolling-back to previous versions. SARIT achieves this through the use of sophisticated version-control software (see the technical section below).

Technical description

SARIT displays Indological texts marked up according to Text Encoding Initiative (TEI) guidelines.

SARIT currently uses a modified version of PhiloLogic™ to display and search its library of texts. PhiloLogic™ is a platform developed by the ARTFL Project and Digital Library Development Center at the University of Chicago. PhiloLogic™ is widely deployed in the digital humanities as a full-text search, retrieval and analysis tool for large TEI document collections. Notable installations include Perseus Project Texts Loaded under PhiloLogic™ and the Digital Dictionaries of South Asia.

The standardized TEI format used for encoding the files in the SARIT library means that other interfaces and toolsets are possible and may be implemented in future. The SARIT developers are currently considering Lucene as a possibile future platform.

Get involved!

The master copies of the SARIT e-texts are maintained at Github, where a complete history of all edits to the files is tracked. If you wish to report errors or suggestions about the content of the SARIT text editions, for example errors in transcription or encoding, please go to the issues page and report in as much detail as you can. Experienced Git users are encouraged to participate in maintaining the files themselves through the Github system.

A detailed rationale for the use of Git in the manner of SARIT is offered by Christian Wittern, in his paper "Beyond TEI: Returning the Text to the Reader," Journal of the Text Encoding Initiative [Online], Issue 4 | March 2013, Online since 25 February 2013, connection on 21 June 2013. URL: http://jtei.revues.org/691; DOI: 10.4000/jtei.691.

Acknowledgements

SARIT has been developed initially through individual unpaid effort by Dominik Wujastyk, Patrick Mc Allister and several colleagues. It has also received important financial support from The British Association for South Asian Studies (that enabled Richard Mahoney to develop the site and add texts), and the Deutsche Forschungsgemeinschaft and the National Endowment for the Humanities. We wish to acknowledge support from the NEH/DFG Bilateral Digital Humanities Program (grant no. 11232164).