stanza icon indicating copy to clipboard operation
stanza copied to clipboard

Add Abkhaz

Open Bachstelze opened this issue 4 years ago • 4 comments

How can we add the abkhazian language? There are a few resources like https://gitlab.com/Bachstelze/alp and https://github.com/danielinux7/Multilingual-Parallel-Corpus . Can we port those models to stanza or do we have to retrain them?

Bachstelze avatar Oct 13 '20 09:10 Bachstelze

@Bachstelze it looks like the first step might be to annotate a corpus in Universal Dependencies. I'd be interested in working on that, please feel free to contact me if you are too.

ftyers avatar Oct 13 '20 14:10 ftyers

Are there proven and known ways to generate treebanks from scratch for post-editing? Is it possible to start with pos tagging and then preparse UD?

Bachstelze avatar Oct 13 '20 22:10 Bachstelze

Maybe, but it would take you longer and you would end up with a worse end result. It's easier to just annotate from scratch. If there is glossed or tagged text this can be used to bootstrap a conversion. You could for example use UD Annotatrix (with apologies for the orthography): Peek 2020-10-13 23-49 You can skip some of the steps if you have a decent part-of-speech tagger, or a glossed corpus. I'm guessing that for Abkhaz morphological analysis would also be needed if you want to fill out the FEATS column. Anyway, I think that it would make a nice project.

ftyers avatar Oct 13 '20 22:10 ftyers

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 29 '20 18:12 stale[bot]