dkpro-core icon indicating copy to clipboard operation
dkpro-core copied to clipboard

Odd PRT input POS tag in MaltParser English models

Open reckart opened this issue 9 years ago • 2 comments

The English models for MaltParser use an input POS tag PRT which does not exist as a POS tag in the Penn Treebank Tagset. PRT is actually a chunk tag (particle). Maybe there was a tagging error in the training data used for the model.

Affected upstream models:

  • engmalt.poly-1.7.mco (MD5: 6f81de28b4c3f1f309a578d7d21fcb6e)
  • engmalt.linear-1.7.mco (MD5: ca24797d5470763d41e142c977f54321)

Affected DKPro Models:

  • de.tudarmstadt.ukp.dkpro.core.maltparser-model-parser-en-linear (v 20120312.X)
  • de.tudarmstadt.ukp.dkpro.core.maltparser-model-parser-en-poly (v 20120312.X)

They also contain a tag $ which is not part of the PTB tagset. These models also use regular brackets instead of PTB tags such as -LRB- and -RRB-.

reckart avatar May 08 '16 19:05 reckart

Some lists of PTB tags seem to include $ as a tag:

  • http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html

... others do not:

  • http://faculty.washington.edu/dillon/GramResources/penntable.html
  • http://www.clips.ua.ac.be/pages/mbsp-tags

reckart avatar May 09 '16 13:05 reckart

PTB Tagging Guidelines (3rd edition) contains 36 tags: http://repository.upenn.edu/cis_reports/570/

reckart avatar May 10 '16 21:05 reckart