Search/Browse Help - Searching/Displaying Non-Latin Characters
About Non-Latin Characters in Catalog Records
Records in the LC Catalog use the Unicode standard for MARC 21 for search and display. This ISO standard is based on Unicode opens in a new window (UTF-8 encoding). The MARC 21 Unicode standard currently supports the following non-Latin languages/scripts: Chinese, Japanese, Korean; Cyrillic-based scripts; Greek; Hebrew (e.g., Hebrew, Yiddish, Ladino, and Judeo-Arabic); and Perso-Arabic script (e.g., Arabic, Persian, Pushto, Sindhi, Urdu).
While non-Latin metadata has been present in LC Catalog records for several years, records may contain a mix of non-Latin and transliterated data or transliterated data only. Library staff use the ALA-LC Romanization Tables as the basis for transliteration.
When viewing records that contain non-Latin characters, make sure your browser is configured to display Unicode fonts, with automatic character encoding activated. If you encounter display problems, you may need to reconfigure your browser settings opens in a new window.
Tips for Searching Non-Latin Data
- Non-Latin characters can be searched in names, titles, series, notes, and many subjects. All topical subject headings, however, use English-language terms.
- Wildcards may be especially helpful when searching non-Latin data, when you may be unsure of the correct MARC 21 Unicode values. Use a percent sign (%) as a single character wildcard and a question mark (?) for truncation or to substitute for multiple characters.
- Most marks of punctuation in your search query are converted to spaces. Some punctuation and diacritic marks are removed: apostrophes, alifs, ayns, middle dots, primes and double primes. A few special characters, however, are retained in searches: ampersands (&), plus signs (+), at signs (@), number signs (#), and musical flat (♭) and natural (♮) signs (musical sharps are converted to spaces). Special characters are generally converted to their nearest alphabetic equivalent (for example: an æ diagraph to ae or a þ thorn to th).
- Chinese, Japanese, and Korean characters in LC Catalog records must use the ideographs found in the Unicode standard for MARC 21. If your CJK search returns fewer results than you expect, please consult the CJK Compatibility Database to help you identify the appropriate MARC 21 Unicode equivalents for non-MARC 21 CJK Unicode characters. If no MARC 21 equivalent exists for the character you need, use a single or multiple-character wildcard or the "missing character" symbol ( 〓 ) in your query.
- Language-specific search tips are available for Chinese, Japanese, Korean, and Hebrew.
Browsing Non-Latin Headings
Headings Browse Lists for authors/creators, subjects, names/titles, and series/uniform titles are arranged in the following order: Latin script entries are listed first, followed by entries in Greek; Cyrillic-based scripts; Hebrew (e.g., Hebrew, Yiddish, Ladino, and Judeo-Arabic); Perso-Arabic script (e.g., Arabic, Persian, Pushto, Sindhi, Urdu); then Korean, Japanese, and Chinese. Heading Browse List entries in Latin and most non-Latin languages/scripts are sorted alphanumerically. However, non-Latin headings for Chinese and Japanese are sorted by Unicode code point values opens in a new window.
Titles Lists
Search results displayed in Titles Lists contain Latin characters only in the brief entries for each record. Titles Lists are rank-ordered by relevancy or sorted by transliterated name/title metadata.
CJK Compatibility Database
When searching for Chinese, Japanese, and Korean ideographs, you must use characters found in the MARC 21 Unicode standard. Searching non-MARC 21 Unicode characters in the LC Catalog will not return results.
To help you quickly determine MARC 21 equivalents for non-MARC CJK characters, the Library maintains a CJK Compatibility Database of more than 450 non-MARC 21 CJK characters matched with their MARC 21 equivalents. You can search this database by character or browse all entries.
Input Method Editors (IMEs)
To retrieve records containing non-Latin data, you must be able to type the alphanumeric, phonetic, or ideographic characters you need into the LC Catalog search boxes. This capability requires input method editors (IMEs). These software applications are generally associated with the language and internationalization components your device's operating systems. Some IMEs create on-screen keyboards, while others support keyboard-free text-entry methods that convert Latin to non-Latin characters. Multi-script input tools are also available for various devices from Google opens in a new window.