IISc researchers receive USD 2 million to democratise voice technology development in nine Indian languages
With this goal in mind, the researchers at IISc have embarked on this project with the aim of creating and making available datasets for the development of voice technologies in nine Indian languages: Bhojpuri, Maithili, Maghadi, Hindi, Chhattisgarhi, Bengali, Kannada, Telugu and Marathi. A majority of the existing training data sets required for building such voice technologies in Indian languages are not in the public domain and lack local innovation. They are also focused on the languages and accents used in the highly profitable economically developed markets, biased towards urban and educated users. The collection of open voice data, particularly for less literate and marginalised populations, will strengthen the local AI ecosystems and enable millions of people to access services they are not able to use yet – be it in agriculture, education, health, or other sectors.
The initiative is targeted at open voice data sets that can be used to train machine learning algorithms in a freely accessible way and enable the creation of open-source AI-based solutions. The work will also support the Indian Natural Language Translation Mission (NLTM) under the Ministry of Electronics and Information Technology (MeitY) and help free the potential of voice technology that until now remains widely untapped.
The project aims to collect more than 11,000 hours of high-quality, gender-balanced voice datasets for automatic speech recognition in nine Indian languages in the domains of agriculture and finance which are highly relevant for poor farmers and women. The investment from FAIR Forward will support the collection of nearly 1,000 hours of gender-balanced high-quality speech recordings from voice artists for the development of text-to-speech applications in the same nine Indian languages. The datasets will be made available openly and freely for Indian academics, start-ups, researchers, and developers to spur innovation and academic activity in the development of regional voice