sanskrit icon indicating copy to clipboard operation
sanskrit copied to clipboard

Interest in implementing your models

Open kylepjohnson opened this issue 6 years ago • 4 comments

Dear @OliverHellwig , I am one of the maintainers of the https://github.com/cltk/cltk, which is an NLP framework for dead languages. Recently @sainimohit23 alerted us to your sandhi splitting project -- well done, this is an important advancement for Sanskrit!

There is not an open source license in this repo -- would you consider sharing your code and models with the CLTK? If you would like to implement it yourself, or to mentor a student in implementing it, you are most welcome to join us!

Please reach out by email if you want to talk 1:1 (my address is on my GH homepage). Thank you!

kylepjohnson avatar Feb 11 '19 17:02 kylepjohnson

@kylepjohnson @OliverHellwig I have started working on it. Right now I'm experimenting and doing some testing on the code.

I would also like to request @OliverHellwig to help me out if I ever get stuck. It would be great learning experience for a student like me who is interested in NLP.

sainimohit23 avatar Feb 12 '19 19:02 sainimohit23

@sainimohit23 any luck for now?

gasyoun avatar Mar 27 '19 18:03 gasyoun

@gasyoun I implemented the whole pipeline very next day I commented on this issue. Here's the code: https://github.com/sainimohit23/cltk/tree/master/cltk/stem/sanskrit/code

sainimohit23 avatar Mar 27 '19 18:03 sainimohit23

@gasyoun @sainimohit23 Just reaching out to let you know that a CLTK person has alerted us to the CONLL treebanks here. I don't recall them being available last I checked, which according to this ticket was two years ago!

We are thinking of using these to make models with spacy. If you or anyone you know would be interested in helping, you're welcome to join us. I have only done small amounts of Sanskrit and we'll need knowledgable people to evaluate them.

kylepjohnson avatar Feb 08 '21 07:02 kylepjohnson