Anoop Kunchukuttan comments

Results 33 comments of


Anoop Kunchukuttan

CSTD-Telugu ASR Corpus

There is also a 2k hr Telugu dataset from the same group in IIIT-H. Do you know the details for that? @GokulNC

MuRIL

@maninuthi: please contact the MuRIL developers for details. I don't know the details of the architecture.

CEnTam- Corpus

Thanks Sanjanasri. I have a few questions: - What are the sources for the corpus? Does it include books and literary sources? A list of sources will be valuable documentation....

Sanskrit-English MT data

Some of this data might be machine translated.

Transliteration not proper for few characters in Tamil

Thanks for pointing out. The extended ITRANS standard we defined does not probably have a mapping for this character. I will check this over the weekend.

Placement of Anuswara

Thanks for your inputs. Let me take a look at the issue you mention in a couple of days.

Issue in Romanization

What is the version of pandas that you are using?

get_normalizer() takes 2 positional arguments but 3 were given

You need to pass keyword arguments after the first argument normalizer=factory.get_normalizer('hi',remove_nuktas=True) Thanks for pointing out, I need to cleanup the documentation

Is translate function available?

The translate function is not available as part of this library. But, we recently released translation models for Indian languages - called IndicTrans. You can use those: https://github.com/AI4Bharat/indicTrans

name 'geomm_utils' is not defined

Let me check and I will get back.