| Skip to main content | Skip to Navigation
Indian Goverment
TDIL TDIL
 
What's New
Request for feedback on
  Inscript Keyboard with new Rupee Symbol
Language Tags
CLDR
Standardization of Speech Corpora for Indian languages
W3C India Office
Media Coverage
Success Stories
Messages
Calendar of Events
Report Language Computing Issues
Language Technology Players
Language Technology Products
Related Links
Frequently Asked Questions
 
Indian Language Technology Proliferation and Deployment Centre
India National Portal
  Skip Navigation LinksHome ->Research Efforts Print   Font increase   Font size reset   Font size decrease

Research Effort

 

1. Machine Aided Translation (MAT)

In Machine translation one natural language gets translated to another language using computational applications without real time human interface or with minimal human effort, the various software’s developed under the Machine Translation project are as follows:-

A) Development of English to Indian Languages Machine Translation System (Anuvadaksh)
Since majority of the Indian population could not read or write in English, while most of the information available on web or electronic media is in English language, therefore to reach out to the common man across various sections, an automatic language translator is important. Hence to begin with, two specific domains are identified as Tourism and Health for the machine translation .The project is being implemented in consortium mode and ten institutions are participating to build the system. Work on experimental Machine Translation System has been made available for following languages pair as technology demonstrator:

i) English to Hindi.
ii) English to Marathi.
iii) English to Bangla.
iv) English to Oriya.
v) English to Tamil.
vi) English to Urdu.

B) Development of English to Indian Languages Machine Translation (MT) System with Angla-Bharti Technology:
ANGLABHARTI represents a machine-aided translation methodology specifically designed for translating English to Indian languages. Angla-Bharti uses pattern directed approach using context free grammar like structures. It analyses English only once and creates an intermediate structure called PLIL (Pseudo Lingua for Indian Languages). The PLIL structure is then converted to each Indian language through a process of text-generation. There is a provision for automatic pre-editing & paraphrasing, recognition of named-entities and incorporated an error-analysis module and statistical language-model for automated post-editing. The purpose of automatic pre-editing module is to transform/paraphrase the input sentence to a form, which is more easily translatable. The project had being implemented in consortium mode with four institutions are participating to build the system. The languages pairs being targeted are English to Hindi/ Marathi/ Bengali/ Oriya/ Tamil/ Urdu. Experimental Machine Translation System has been made available for following languages pair as technology demonstrator:

i) English to Bangla
ii) English to Punjabi
iii) English to Malayalam
iv) English Urdu


C) Development of Indian Language to Indian Language Machine Translation System (Sampark):
As India has 22 constitutionally recognised languages, Indian Language to Indian Language Machine Translation system (IL-ILMT) is an important application to convert text written in one Indian language to other Indian language. The project is being implemented in consortium mode and eleven institutions are participating to build it the system. Experimental Machine Translation System has been made available for following languages pair as technology demonstrator:
i) Punjabi to Hindi
ii) Hindi to Punjabi
iii) Urdu to Hindi
iv) Telugu to Tamil

2. Development of Cross-lingual Information Access (CLIA)

Cross-Language Information Access is an extension of the Cross-Language Information Retrieval paradigm. It enables a user to enter queries in languages they are familiar with, and uses language translation methods to retrieve documents originally created in other languages.
The objective of Cross-Language Information Access is to introduce additional post retrieval processing to enable users make sense of these retrieved documents. This additional processing may take the form of machine translation of snippets, summarization and subsequent translation of summaries and/or information extraction. The project is being implemented in consortium mode and eleven institutions are participating to build the system. At present, six languages are being targeted under Tourism and Health domain:-

i) Assamese
ii) Bengali.
iii) Gujarati
iv) Hindi.
v) Marathi.
vi)Oriya
vii) Punjabi.
viii) Tamil.
ix) Telugu.

3. Development of Robust Document Analysis & Recognition System for Indian Languages (OCR)

Optical Character Recognition (OCR) is a utility tool for digitizing the content and is essential for development of knowledge networks such as digital libraries. OCR technology offers the facility to scan and store the printed text. There are three basic elements of OCR technology - scanning, recognition and then reading text. The project is being implemented in consortium mode. The ten scripts/languages being targeted are:-
i) Assamese
ii) Bengali
iii) Bodo
iv) Devanagari
v) Gujarati
vi) Gurumukhi
vii) Kannada
viii) Malayalam
ix) Manipuri
x) Marathi
xi) Oriya
xii) Tamil
xiii) Telugu
xiv) Tibetan
xv) Urdu

4. Development of On-line handwriting recognition system (OHWR)

On-line handwriting recognition system (OHWR) is a useful tool that converts the written strokes of an individual into editable text thus bypassing the need for a keyboard for text entry. There are seven institutions participating to build the On-Line Handwriting Recognition System. The six scripts being targeted are:-
i) Assamese
ii) Bengali
iii) Devanagari
iv) Gurumukhi
v) Kannada
vi) Malayalam
vii) Tamil
viii) Telugu

5. Development of Text to Speech System for Indian Languages (TTS)

Consortium Mode Project has been initiated to develop Text-to-Speech system in six Indian Languages Hindi, Bengali, Marathi, Malayalam, Tamil and Telugu languages. The objective of the project is to develop and deploy Text to Speech system for visually challenged persons with JAWS product (For English) like functionality, which will be an application for benefit of social cause.

6. Development of Automatic Speech Recognition in Indian Languages (ASR)

Consortium Mode project has been initiated for development of Automatic Speech Recognition system for accessing prices of agricultural commodities through telephone channel As an interface on NIC website , which is multilingual and provides information on agricultural commodities .

7. Development of Sanskrit Machine Translation System

In India, there have been several efforts in the development of computational tools for Sanskrit. Under the leadership of University of Hyderabad a consortium Mode project has been initiated with the objective to develop Sanskrit computational tools and use them to develop machine translation technology from Sanskrit to Hindi.

Valid XHTML 1.0 Transitional Valid CSS! Level A conformance icon, 
          W3C-WAI Web Content Accessibility Guidelines 1.0papers