HanTa
HanTa copied to clipboard
Where can I see the entire set of possible POS tags?
It seems I cannot find a way to list all the tags your tagger knows. There are definetely some that are not listed in your tutorial. Where can I find the entire list, hopefully including an explanation and examples for each tag?
Thanks!
I think the tags are based on the Tiger annotation scheme, available here:
https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/mitarbeiter-innen/hagen/STTS_Tagset_Tiger
Sorry, I replied by mail instead of using Github. Indeed, HanTa is trained mainly on the Tiger corpus and thus uses the Tiger annotation Scheme: https://www.ims.uni-stuttgart.de/documents/ressourcen/korpora/tiger-corpus/annotation/tiger_scheme-morph.pdf (esp. pp 26/27)
A general description is available e.g. here: https://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/germantagsets/#id-cfcbf0a7-0 or here: https://homepage.ruhr-uni-bochum.de/stephen.berman/Korpuslinguistik/Tagsets-STTS.html
These are the POS tags that are used. Most tags used to annotate morphemes should be quite clear given the POS tags. I am working on a documentation that includes those as well.
In the latest version I have added two methods:
- list_postags()
- list_mtags()
The first one gives a list of all POS-tags, the second a list of all tags used for morphemes. For each tag some random examples are generated. Have a look at the Demo-Notebook in the German section for an example.
Besides that, I am still woking on a comprehensive documentation.