The SARIT Interface

The interface of the SARIT application consists of four main pages:

  • a start page for listing the works contained in SARIT and performing searches in them
  • a page for displaying the SARIT project's description of the editorial processes behind a SARIT text edition and for listing the text's table of contents, reached after clicking the title of the text in the list of works
  • a page for displaying the list of hits resulting from a search
  • a page for viewing part of a text selected by clicking on a reference in the table of contents or on a hit in the hitlist.

Each of these pages will be described in the following.

List of Works and Search Interface

The start page of the SARIT web platform displays a list of the texts in the SARIT corpus and the search interface.

The list of texts gives the text's title, notes its putative author(s) or commentator(s) and whether the text uses Devanagari script or romanisation. Most texts exist in both Devanagari script or romanisation, one converted from the other.

There are also links for downloading the texts in various formats: epub, pdf and xml (zipped).

The last link gives the possibility of opening the XML containing the text in an application called eXide. eXide is an application in the eXist-db database that has been used to develop the SARIT application. eXide also allows you to perform searches, using XPath and XQuery, the main programming languages used in the SARIT application, but that is another (rather long) story ….

The search interface, located on the right of the list of works, offers a number of options, among these to search based on the information displayed in the list.

These options will be described in the following. The names of the dropdown menus are displayed when hovering over them.

Search For What?

The search term is of course where you will be actively searching for occurrences of the words and phrases you are interested in. All the other choices you make in the search interface restrict or modify the search term(s) you input here.

Search With Which Scripts?

Your search term is automatically converted to make your search proceed in both IAST romanisation and in Devanagari. If you wish to limit this, select which script to search with.

Use Which Index?

This is where it gets a little technical ….

All searches in SARIT go via a database index. A database index is basically like the index of a book, only it contains references for each and every word or string of characters in the text.

In SARIT, you can search using two different indexes, an NGram Index and a Lucene Full-Text Index.

The NGram Index analyses the text in terms of strings, regardless of any word divisions. The index consists of overlapping chunks of text three characters long (tri-grams).

The Lucene Full-Text Index analyses the text in terms of words, that is strings as separated by spaces, etc. The index consists of each and every word in the text.

When you search with the NGram Index, you will find all occurrences of your search term, regardless of whether it occurs as a whole word (that is, space-separated string) or as only part of a word.

When you search with the Lucene Full-Text Index, you will only find occurrences of your search term that are identical with whole words (that is, strings separated by spaces, etc.). You will not find occurrences that are only part of a word, unless you use Regular Expressions.

The Lucene Full-Text Index is thus useful for texts segmented into words, that is, texts divided into chunks that resemble words in European languages.

Word division is a matter of degree. Among the SARIT texts, only the Arthaśāstra can be said to be divided into words in a way which makes use of the Lucene Full-Text index immediately suitable. For information about the remaining texts, see Word Division. With these texts, you should as a rule use the NGram Index.

Note that this does not mean that the NGram Index cannot be used with texts that are divided into words – the two indexes make possible different searches, one for words (the Lucene Full-Text Index) and one for strings of characters (the NGram Index). You will almost always get more hits with the NGram Index, but the Lucene Full-Text Index gives you more search options (and is slower).

There are further explanations in the Indexes and Searching Help.

Search Which Document Part?

Each SARIT text consists of

  • a header which provides information about the text edition and
  • the text proper
.

In this dropdown menu you can select to search in one or both of these document parts.

Search Narrowly Or Broadly?

SARIT marks up texts according to the TEI Guidelines. According to the TEI Guidelines, a text forms a hierarchy. A book thus may consist of a number of chapters, which contain a number of sections, which contain a number of paragraphs – which then contain the text itself. A narrow search is one which searches in the uppermost text-containing levels of the text (in the example, the paragraphs), whereas a broad search is one which searches in the levels of the text immediately above these (that is, in the example, the sections).

This is primarily of use if you wish to search for the co-occurrence of several words, for whereas they may co-occur in a single section, it is not necessarily the case that they co-occur in a single paragraph.

In this explanation, sections and paragraphs have been used to illustrate this concept, but SARIT texts are more complicated than this. For a fuller explanation, see the Indexes and Searching Help

Search In Texts By Which Authors?

You can choose which texts to target in your search in two different ways.

If you wish to make selections among the texts based on their author, you can use this dropdown menu. Note that you can select several authors.

In order to select individual texts to search in, you can click the checkboxes to the left of their titles in the list of works.

Note that if you do not make any choice, you will search in all available texts – there is no need to click all checkboxes in order to perform a search throughout all texts.

Perform New Search or Continue Last Search?

This gives you the option to refine your previous search, by restricting or broadening the search result.

The default value is to perform a new search. If you perform a new search, your search will not relate to your previous search.

If you choose "AND Last Search", your search will restrict your last search by requiring that the search criteria for both the last and the present search are satisfied. You typically use this to search for the co-occurence of two words.

If you choose "OR Last Search", your search will broaden your last search by including both the results from the last and the present search in the search results. You typically use this to search for alternative forms of a word.

If you choose "NOT Last Search", your search will restrict your last search by excluding the results from the present search. You use this to remove specific hits from the search results.

The TEI Header and Table of Contents

Clicking a title in the list of works will display the TEI header of the text and its table of contents.

The TEI header describes the text and the editorial processes that have resulted in the SARIT edition. Since the header can be quite long, by default only the first part, which gives the title and author(s) and lists the persons and institutions that have contributed to the SARIT version is shown. If you wish to access the whole TEI header, click the Toggle Full Header button.

At the bottom of this page is shown the table of contents of the SARIT text. This may divide into three parts, front matter, the text itself, and back matter.

The table of contents can be expanded by clicking on the plus signs and collapsed again by clicking on minus signs. The table of contents descends three levels, i.e. divisions below the third level are not shown (for reasons of speed).

Clicking on one of the links in the table of contents will take you to the page view for the part of the text in question.

The Hitlist

After performing a search for e.g. "lakṣaṇa", you will be presented with a hitlist.

The results of your search are presented in the form of a keywords-in-context (KWIC) display, centring the strings that were found matching to your search term, with the context the matches occur in on the left and right. Clicking the center string (here, "lakṣaṇa") will display the text in which the hit occurs, with all matches highlighted.

The order in which the hits occur differs according to the index used for search. With an NGram Index, the hits are simply ordered according to the title of the texts in which they occur. With a Lucene Full-Text Index, the hits are ordered according to their score. The score is a measure of the relevancy of the hits in relation to the search. What this means is a little technical, but, generally speaking, the more often more of your search terms occur in a hit and the rarer the search terms are within the texts searched, the higher the score will be.

The search term matches are grouped in hits, i.e. according to the levels of the document targeted by your search, with clickable references to the text as a whole (showing the header information and table of contents) and the part of the text in which the hit occurred (displaying the text itself). One hit (such as the second here) may contain several matches.

The hits are shown twenty at a time on each page. At the top, you have the total number of hits and a means of navigating from hit group to hit group.

The Page View

After clicking on one of the links that leads to a direct display of a page, you are shown the page view.

If you have clicked on a search term link in the hitlist, the search term(s) will all be highlighted if you are searching through the Luucene Full-Text index, but only one will be highlighted if you search through the NGram Index.

It is not possible to view a whole text, due to the heavy load on the browser this would entail in case of very long texts, but as a substitute for this you can download the whole text in a number of formats from the list of works.

You have the option of moving back and forth in the text. There may occur a "blank layer", with no text displayed, in which case you should just move on one click. Such "blank layers" occur because the text moves from one division down to a lower division which does not have any text and does not have a header.

From the button in the left side of the screen, you have the option of accessing the table of contents, allowing you to move from chapter to chapter or section to section.

Clicking the title of the text will take you to the page displaying the text's header and table of contents.