Xponents icon indicating copy to clipboard operation
Xponents copied to clipboard

Trivial "Do Do" false-positives

Open mubaldino opened this issue 4 years ago • 3 comments

Describe the bug "Do. Do", "do. Do", "in Do"`, etc. are common false positives found still.

To Reproduce Xponents 3.3

Expected behavior Better filtering of these. Likely use a spaCy NER model to offer POS tags and eliminate obvious errs.

mubaldino avatar Jun 30 '20 10:06 mubaldino

Add "text_norm" to indexer to review common false-pos still appearing.

mubaldino avatar Jun 30 '20 10:06 mubaldino

Addressed in part by NonSenseFilter -- removing lowercase matches.

mubaldino avatar Oct 14 '20 21:10 mubaldino

Seems more like gazetteer ETL fixes than a pattern generalization. If such trivial gazetteer entries should never be tagged, then we mark them search_only=1

mubaldino avatar Feb 07 '22 15:02 mubaldino