Anoop Kunchukuttan

Results 33 comments of Anoop Kunchukuttan

There is also a 2k hr Telugu dataset from the same group in IIIT-H. Do you know the details for that? @GokulNC

@maninuthi: please contact the MuRIL developers for details. I don't know the details of the architecture.

Thanks Sanjanasri. I have a few questions: - What are the sources for the corpus? Does it include books and literary sources? A list of sources will be valuable documentation....

Some of this data might be machine translated.

Thanks for pointing out. The extended ITRANS standard we defined does not probably have a mapping for this character. I will check this over the weekend.

Thanks for your inputs. Let me take a look at the issue you mention in a couple of days.

What is the version of pandas that you are using?

You need to pass keyword arguments after the first argument normalizer=factory.get_normalizer('hi',remove_nuktas=True) Thanks for pointing out, I need to cleanup the documentation

The translate function is not available as part of this library. But, we recently released translation models for Indian languages - called IndicTrans. You can use those: https://github.com/AI4Bharat/indicTrans

Let me check and I will get back.