Search Term Help

Last modified: Tue Nov 9 00:38:24 CET 2010
SARIT: Search Term Help. [ *]


Please see the current Philologic user manual for general information. The following points are specific to SARIT.

Search terms are entered into the Terms(s) input field in either the basicor advancedsearch form. One must, however, enter the correct utf-8 accented charactersor diacritics, for e.g., please enter ahiṁsā, not ahimsa. Alongside or below the main search form is a key listing the most often used diacritics for Romanised Sanskrit. These characters may be cut and pasted into the search input field. Search terms may also be entered using wildcard charactersto match patterns, for e.g., akṣar.* to retrieve akṣaro, akṣarāṇām and so on.


Diacritics and Special Characters
Diacritics for Romanised Sanskrit
Special Characters
Wildcard Characters and Boolean Operators
Full Text Searching: Wildcard Characters; Boolean (Logical) Operators
Searching for Titles and Authors &c.
Punctuation Marks and Searching
Full Text Searching: Apostrophe; Hyphen; Period
Searching for Titles and Authors &c.

-Diacritics and Special Characters -

SARITdigital texts are encoded in utf-8. All search terms must therefore use utf-8 characters in all the search input fields, including the Terms(s) and the Bibliographic fields (i.e., the Title and Author fields &c.).

If your computer is not configured to input diacritical marks directly, please copy and paste the characters into the search field from the list of characters shown on the search pages.

--Special Characters --

ampersands (&) and many other punctuation marksare not searchable characters
mathematical symbols
the equal sign (=) and minus sign (-) will produce a ``No words matching specified search term(s)'' message
the plus sign (+) is not a searchable character, but, if entered, will be ignored

-Wildcard Characters and Boolean Operators -

One can include wildcard characters in search terms. This enables one to search for terms that match a pattern. Wildcard characters are available for both full text and bibliographic searching.

--Full Text Searching --

SARITsupports wildcard characters and Boolean (logical) operators, which are modeled on UNIX regular expressions to perform ``pattern matching'' in full text searching. Pattern matching allows identification of a large number of words corresponding to a defined pattern. Wildcard characters can be useful, for example, in identifying cognates made obscure by affixes and vowel weakening, inconsistencies due to irregular orthography, and variations on account of word inflection as well as for discovering potential emendations for uncertain readings. The most commonly used regular expression operators (wildcard and Boolean) are listed below.

---Wildcard Characters ---
. (period) :: matches any single character:
akṣar.:: matches akṣara and akṣaro [ Search result]
.* (period asterisk "dot-star") :: matches any string of characters:
.*māṇo:: matches mantrayamāṇo, sadharmāṇo, śiṣyamāṇo, karmāṇo, pramāṇo, &c. [ Search result]
a.*yataḥ:: matches anyataḥ, akārayataḥ, amokṣayataḥ, adūṣyataḥ, &c. [ Search result]
kauṭil.*:: matches kauṭilyaḥ, kauṭilyena, kauṭilīya, &c. [ Search result]
.? (period question mark) :: matches the characters entered, or the characters entered plus one instead of the question mark:
kañci.?:: matches kañcid and kañcit [ Search result]
[a-z] (brackets) :: matches a single character within a range:
kuryā[a-o]:: matches kuryād and kuryān but not kuryāt [ Search result]

N.B., for a full list of words matching a wildcard search term, go to the advanced search interface, enter the wildcard search term, select the Refined Search Results tab, select the Frequency by Title radio button, and then press `Search'.

---Boolean (Logical) Operators ---
| (vertical bar) :: the OR operator:
kañcid | kañcit:: matches kañcid OR kañcit [ Search result]
Space :: the AND operator in sentence and paragraph proximity searching:
teṣām apy:: matches teṣām AND apy in sentence and paragraph proximity searching [ Search result]

N.B., wildcard characters and boolean operators can be combined within the same search: e.g., mantriṇ.* | kañci.?. [ Search result]

--Searching for Titles and Authors &c. (Bibliographic Searching) --

When searching for titles and authors and so on (bibliographic searching) one needs only limited support for wildcard characters and Boolean operators. In general, one should only need to enter an uncommon term from the title or author's name. Please note that only the Boolean operator OR (|) can be used, not AND (space); that the wildcard operator (.*) is unnecessary; and that a title or author's name that contains diacritics must be entered with utf-8 characters, not postfix modifiers.

-Punctuation Marks and Searching -

In general, it is advisable to avoid using punctuation marks when engaged in full text or bibliographic searching.

--Full Text Searching --

Punctuation marks must be avoided when full text searching. Many of the symbols often used for punctuation are used by SARITfor postscript modifiers or wildcard characters.

The punctuation that must be eschewed includes: the comma (,), question mark (?), exclamation mark (!), vertical bar (|), forward (/) and backward (\) slashes, parentheses (( )), braces({ }), brackets([ ]), angle brackets (< >), colons (:), and semi-colons (;) as well as quotation marks (` ' "), ampersands (&), asterisk (*), percentage sign (%), dollar sign ($), and number sign (#).

Some punctuation marks are especially problematic and deserve further comment:

---Apostrophe ---

The Tibetan 'a chuṅ is represented by an apostrophe ('). When searching for a term including an apostrophe one should substitute a wildcard character: e.g., search for tha snyad .*dogs pa or tha snyad .?dogs pa rather than tha snyad 'dogs pa.

---Hyphen ---

Some texts use hyphens to separate words within compounds. When searching for words within a hyphenated compound one should omit the hyphen: e.g., search for apāya hetu rather than apāya-hetu.

---Period ---

The period (.) is not searchable. It serves as a wildcard character.

N.B., as few digital texts are tagged for sentence termination, PhiloLogic™relies on punctuation marks in combination with capitalisation to identify sentence termination. This is especially problematic for Indological and Buddhological texts.

--Searching for Titles and Authors &c. (Bibliographic Searching) --

The following punctuation marks produce a ``No documents found matching specified bibliographic criteria'' message when used in bibliographic search input fields: parentheses (( )), semi-colons (;), colons (:), ampersands (&), apostrophes ('), quotation marks (` ' "), braces ({ }), brackets ([ ]), and angle brackets (< >), forward slash (/), as well as the dollar sign ($).

The following punctuation marks have no adverse effect on a bibliographical search and, if appearing within a string, must be entered: period (.), hyphen (-), question mark (?), exclamation mark (!), and comma (,).

N.B., On the whole, it is perhaps most convenient simply to avoid using punctuation marks when making bibliographic searches. All that is often needed to find what one wants is to enter an uncommon bibliographic term from either the title or author's name.

[ *]This page is a modified version of the PhiloLogic™ User Manual: 3. Character Representation for Search Terms.