Gaurav Arora

Results 35 comments of Gaurav Arora

Thanks @shaz13 for raising this. Do you think adding instructions in README to download models manually and placing them in appropriate directory will be helpful?

Thanks @Ishan-Kumar2 for your contribution. Can you share a notebook showing the issue which this PR tries to solve? I was not able to reproduce the issue which you seem...

Hi @DeepikaSharma5 , iNLTK doesn't specifically support Lemmatization, but you can do it using [stanza](https://stanfordnlp.github.io/stanza/installation_usage.html) if you want. With respect to usage on Windows, I've seen people use it but...

@octalpixel , @parmarsuraj99 Thanks for reaching out. Currently, it isn't straightforward/possible to integrate it with the transformers library. I'll be happy have contributions from the community to help with it.

@parmarsuraj99 yes you can use sentencepiece or Huggingface's tokenizers (https://github.com/huggingface/tokenizers) library. I've been working on training BERT Hindi model using the tokenizers and transformers library from Huggingface.

@TviNet Thanks for reaching out! I glanced over [LM-LSTM-CRF repo](https://github.com/LiyuanLucasLiu/LM-LSTM-CRF), and saw that they're considering every space separated word as a token. I think you can do that for Indic...

Yes, that's why I think using transfer learning is important here, especially for low resource languages.

Thanks a lot @tathagata-raha for your contribution. Your work looks great, I just had few comments: 1. It'll be great if you can also add documentation for this functionality in...

@Shubhamjain27 Will you be able to take this up?

@lordzuko That'll be great! Feel free to raise a PR for this.