Technologies used to support languages falling outside the range of languages traditionally supported by language technology tools.

OCR and input systems for Indian languages

Voice recognition and optical character recognition (OCR) are extremely language-specific technologies. So a tool like the Optical Character Recognition for Indian Languages (Tamil, Malayalam, Hindi, Kannada, Bangla, Assamese, and Punjabi) — http://ocr.tdil-dc.gov.in/(S(e03gd4fcqnsqknnvadt4edha))/default.aspx — must feel like a summer rain if you work in those languages. Of course, the question is whether it works adequately.

Another for all Devanagari-based languages has shown up: Sanskrit CR — https://ocr.sanskritdictionary.com/ (and the — anonymous — person behind this wants us to cleverly pronounce it "Sanskrit Seer"). The application is really just a gateway to either the Google or the above-mentioned open-source Tesseract OCR engines, either of which you can select in the menu after you upload your image file.

Comment by Subhashri D V:

Of course, Google is playing savior by incorporating most Indian languages in its latest tools and initiatives. But, as compared to the variety of tools, databases, etc., available/supported for many other languages, our scene is dismal. The Indian government/private companies are naturally also to be blamed for this situation, for not vesting enough interest in developing resources for our languages. I believe the government does have a head start in this matter (https://www.cdac.in/index.aspx?id=mlingual), and in spite of the recent efforts to propagate these multilingual computing resources (http://tdil-dc.in/index.php?option=com_vertical&parentid=18&Itemid=554&lang=en), they are not readily accessible or good enough to be used by professionals.

That aside, I would also like to highlight the glaring inadequacy of typing tools for Indian languages. The most commonly used tools for typing in Indian languages right now are the Indic language transliteration tools (MS, Google, etc.). Inscript keyboards/layouts (https://en.wikipedia.org/wiki/InScript_keyboard) are available but rarely used, as most people start off with an English keyboard and it is difficult to 'learn' a new one for 'each' language (most Indians are multilingual). As you may already know, there is a world of difference in the way in which English and Indian language characters (Devanagari and other South Indian languages) are written. As a simple example, there are no half letters (https://en.wikipedia.org/wiki/Conjunct_consonant) in English, while they abound in Indic scripts. The transliteration method is ineffective for professionals because it reduces typing speed greatly and leads to a number of errors. Typically Indian words are long, and transliterating them into English requires many double aa, oo, uu, ii (or a mix of capital and small letters if written phonetically), and then one still has to 'select' the correct word from the many options displayed by the software. This is a cumbersome process, even though most of us are now used to it for want of another option.

On the other hand, I think Indic typing tools for mobiles have seen greater development (http://reverieinc.com/blog/comparison-indic-language-keypads/), but sadly almost none have been adapted to the PC/laptop. For a professional translator/writer, it is next to impossible to work on a mobile.

Of all the existing options so far, the handwriting mode (provided in Google Translate — https://support.google.com/translate/answer/6142469) looks to be the best, as this is the fastest and most efficient way of writing Indian languages. But again, it is not really useful to translators unless it becomes a common tool (touch-screen laptops are expensive).

I'm also a fan of the speech-to-text option offered by Google (supported for most Indic languages — https://support.google.com/docs/answer/4492226), although it is still a bit shaky and cannot be used in any other tools except Google Docs.

Coming to Indic OCR tools, they are still in a very nascent stage and don't work most of the time, but I believe this could be a cost-effective and easier option for translators (to handwrite a translation and OCR it, because at the current speed of transliterating, writing seems faster).

[And] I forgot to add that there are not even decent spelling/grammar checkers for Indian languages, including Hindi. Given that the Indian market is now increasingly calling for localization, especially in e-commerce and content, it makes sense to build/integrate suitable tools for translators."

Average Rating 0 from 0 votes