dkpro-core
dkpro-core copied to clipboard
Odd PRT input POS tag in MaltParser English models
The English models for MaltParser use an input POS tag PRT which does not exist as a POS tag in the Penn Treebank Tagset. PRT is actually a chunk tag (particle). Maybe there was a tagging error in the training data used for the model.
Affected upstream models:
- engmalt.poly-1.7.mco (MD5: 6f81de28b4c3f1f309a578d7d21fcb6e)
- engmalt.linear-1.7.mco (MD5: ca24797d5470763d41e142c977f54321)
Affected DKPro Models:
- de.tudarmstadt.ukp.dkpro.core.maltparser-model-parser-en-linear (v 20120312.X)
- de.tudarmstadt.ukp.dkpro.core.maltparser-model-parser-en-poly (v 20120312.X)
They also contain a tag $ which is not part of the PTB tagset.
These models also use regular brackets instead of PTB tags such as -LRB- and -RRB-.
Some lists of PTB tags seem to include $ as a tag:
- http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html
... others do not:
- http://faculty.washington.edu/dillon/GramResources/penntable.html
- http://www.clips.ua.ac.be/pages/mbsp-tags
PTB Tagging Guidelines (3rd edition) contains 36 tags: http://repository.upenn.edu/cis_reports/570/