SARIT: Search and Retrieval of Indic Texts (New)

Overview

Welcome to the SARIT website. Here you will find electronic editions of texts in Sanskrit and other Indian languages. These are documented, dated and have embedded notes about their change history, so that they can be publicly cited and used with confidence as scholarly sources. The editions in the SARIT library currently include these works.

This website also currently offers tools for text search, retrieval and analysis of the works in the SARIT library. You can search for words and phrases, and have your search results displayed as keywords-in-context.

All the texts at SARIT are licensed under a Creative Commons license. You can download all the texts in the following formats: XML, EPUB and PDF; and you can also open the XML-file online.

History

The Thesaurus Linguae Graecae was founded in 1972. Its subsequent growth as a sophisticated research tool, and the projects derived from it, such as PERSEUS, demonstrated compellingly that a corpus of classical texts in machine-readable format could be vitally important for the study of language, art, history and culture. The history of SARIT looks back to a meeting convened by Prof. Richard Lariviere at the University of Texas in 1988 that brought together many specialists interested in creating a "TLG for Indic," i.e., a corpus of machine-readable texts in Indian languages. The minutes of that meeting still make interesting reading today, and lay out some of the rationales underlying the present project. After the Texas meeting, the INDOLOGY website began to make electronic texts of Sanskrit works available in an ad-hoc manner. This effort was highly successful, but there remained problems of format normalization, non-standard fonts and character encodings. Many of these problems were insoluble at the time. Things have moved forward since 1988, especially in corpus linguistics, computer networking, and encoding standards. SARIT takes advantage of all these new developments, especially Unicode, XML and the Text Encoding Initiative (TEI). SARIT was initially developed during 2008 by Dominik Wujastyk and Richard Mahoney, and was announced publicly in February 2009.

What makes SARIT different?

The main difference between this collection of texts in Sanskrit and other Indian languages is that these texts are marked up ("tagged") using the rich Text Encoding Initiative (TEI) system. TEI is a way of tagging files that was designed by humanities scholars for humanities scholars. It is capable of building a great deal of intelligence into the file itself, so that a TEI-file can later serve many purposes and remains flexible and useful in the long term. TEI files contain their own history, so you can always find out where the file came from, how and when it has been updated, and how it relates to printed editions, manuscripts or other sources. For a gentle introduction to the TEI, see the excellent TEI by Example website. The files in this language corpus can be considered as electronic editions, and can be cited as such in scholarly research publications. SARIT texts respect copyright and are licensed appropriately to allow free academic use. Furthermore, SARIT makes its TEI editions of Indian texts freely available to all.

Electronic texts are not static: we edit them, improve them, alter them, and – crucially – correct errors in them. SARIT solves the problem of how to manage the natural evolution of electronic texts, documenting changes and allowing for the divergence and comparison of versions, merging of versions and rolling-back to previous versions. SARIT achieves this through the use of sophisticated version-control software (see the technical section below).

Technical description

SARIT displays Indological texts marked up according to the Text Encoding Initiative (TEI) guidelines. The standardized TEI format used for encoding the files in the SARIT library means that other interfaces and toolsets are possible and may be implemented in future.

SARIT uses the Open Source XML database eXist-db for search and display. The SARIT application has been developed by the HRA at the Cluster of Excellence 'Asia and Europe in a Global Context' (University of Heidelberg) in collaboration with eXist Solutions GmbH.

Get involved!

The master copies of the SARIT e-texts are maintained at GitHub, where a complete history of all edits to the files is tracked. If you wish to report errors or suggestions about the content of the SARIT text editions, for example errors in transcription or encoding, please go to the Issues page and report in as much detail as you can. Experienced Git users are encouraged to participate in maintaining the files themselves through the Github system.

Acknowledgements

SARIT was initially developed through individual unpaid effort by Dominik Wujastyk, Patrick McAllister and several colleagues, who produced and maintained an earlier version of the SARIT web platform until July 2014. It also received important financial support from The British Association for South Asian Studies (that enabled Richard Mahoney to develop the site and add texts).

SARIT received major funding from the NEH/DFG Bilateral Digital Humanities Program for the development of the current platform by the HRA and eXist Solutions, and the addition of new texts by a project team consisting of Liudmila Olalde (University of Heidelberg) and Andrew Ollett (Columbia University), within a project directed by Prof. Sheldon Pollock (Columbia University) and Prof. Birgit Kellner (University of Heidelberg), between 2013 and 2017.

Additional financial support was offered by the Cluster of Excellence 'Asia and Europe in a Global Context', and by the Institute for the Cultural and Intellectual History of Asia at the Austrian Academy of Sciences in Vienna.

We would like to thank Mihail Bayaryn, the developer of the Siddhanta font software, which is used as a web font in SARIT. Siddhanta is published under Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

We would like to thank Omkarananda Ashram Himalayas for use of their freeware Sanskrit 2003 font to generate PDF files.