HanTa icon indicating copy to clipboard operation
HanTa copied to clipboard

Where can I see the entire set of possible POS tags?

Open IngoStorm opened this issue 2 years ago • 2 comments

It seems I cannot find a way to list all the tags your tagger knows. There are definetely some that are not listed in your tutorial. Where can I find the entire list, hopefully including an explanation and examples for each tag?

Thanks!

IngoStorm avatar May 30 '22 18:05 IngoStorm

I think the tags are based on the Tiger annotation scheme, available here:

https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/mitarbeiter-innen/hagen/STTS_Tagset_Tiger

RicKrauel avatar Jun 28 '22 10:06 RicKrauel

Sorry, I replied by mail instead of using Github. Indeed, HanTa is trained mainly on the Tiger corpus and thus uses the Tiger annotation Scheme: https://www.ims.uni-stuttgart.de/documents/ressourcen/korpora/tiger-corpus/annotation/tiger_scheme-morph.pdf (esp. pp 26/27)

A general description is available e.g. here: https://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/germantagsets/#id-cfcbf0a7-0 or here: https://homepage.ruhr-uni-bochum.de/stephen.berman/Korpuslinguistik/Tagsets-STTS.html

These are the POS tags that are used. Most tags used to annotate morphemes should be quite clear given the POS tags. I am working on a documentation that includes those as well.

wartaal avatar Jun 28 '22 10:06 wartaal

In the latest version I have added two methods:

  • list_postags()
  • list_mtags()

The first one gives a list of all POS-tags, the second a list of all tags used for morphemes. For each tag some random examples are generated. Have a look at the Demo-Notebook in the German section for an example.

Besides that, I am still woking on a comprehensive documentation.

wartaal avatar Jan 10 '23 10:01 wartaal