stanza icon indicating copy to clipboard operation
stanza copied to clipboard

Support Swahili

Open tbm opened this issue 5 years ago • 4 comments

Is Swahili a language you're working on?

Based on the requests for other languages, it sounds like the first step is to create a Universal Dependencies description.

Kenneth Steimel (@ksteimel) is working on Swahili (see this presentation) but I don't know what the status is. Are you working with him?

tbm avatar Dec 23 '20 14:12 tbm

I noticed that https://github.com/UniversalDependencies/UD_Swahili-OPUSGV/blob/dev/README.md says 2021-05-15. I'm not sure if that's a typo or the projected release date of the corpus.

tbm avatar Dec 23 '20 14:12 tbm

It's good to hear that there's additional interest in Swahili NLP!

I am working on a treebank as one part of my PhD thesis. The projected release date is may of next year because that was my target for completing my thesis. However, things have come up so it may be delayed a month or two.

I'll work to keep the repo more up to date so you can see my progress more clearly. Sorry about that.

ksteimel avatar Dec 23 '20 15:12 ksteimel

@tbm very likely that that date is a projected release date for UD v2.8. We will definitely consider training a Swahili model should the data become available as part of UD!

qipeng avatar Dec 29 '20 17:12 qipeng