Gaurav Arora comments

Results 35 comments of


Gaurav Arora

Progress bar on async downloads

Thanks @shaz13 for raising this. Do you think adding instructions in README to download models manually and placing them in appropriate directory will be helpful?

fix requirements

Thanks @Ishan-Kumar2 for your contribution. Can you share a notebook showing the issue which this PR tries to solve? I was not able to reproduce the issue which you seem...

Lemmatization for Tamil language

Hi @DeepikaSharma5 , iNLTK doesn't specifically support Lemmatization, but you can do it using [stanza](https://stanfordnlp.github.io/stanza/installation_usage.html) if you want. With respect to usage on Windows, I've seen people use it but...

Integrating with HuggingFace Transformer

@octalpixel , @parmarsuraj99 Thanks for reaching out. Currently, it isn't straightforward/possible to integrate it with the transformers library. I'll be happy have contributions from the community to help with it.

Integrating with HuggingFace Transformer

@parmarsuraj99 yes you can use sentencepiece or Huggingface's tokenizers (https://github.com/huggingface/tokenizers) library. I've been working on training BERT Hindi model using the tokenizers and transformers library from Huggingface.

POS tagging

@TviNet Thanks for reaching out! I glanced over [LM-LSTM-CRF repo](https://github.com/LiyuanLucasLiu/LM-LSTM-CRF), and saw that they're considering every space separated word as a token. I think you can do that for Indic...

POS tagging

Yes, that's why I think using transfer learning is important here, especially for low resource languages.

add code-mixed language identifier

Thanks a lot @tathagata-raha for your contribution. Your work looks great, I just had few comments: 1. It'll be great if you can also add documentation for this functionality in...

identify languages doesn't work with Telugu in v0.9

@Shubhamjain27 Will you be able to take this up?

identify languages doesn't work with Telugu in v0.9

@lordzuko That'll be great! Feel free to raise a PR for this.