Xponents
Xponents copied to clipboard
Trivial "Do Do" false-positives
Describe the bug
"Do. Do"
, "do. Do"
, "in Do"`, etc. are common false positives found still.
To Reproduce Xponents 3.3
Expected behavior Better filtering of these. Likely use a spaCy NER model to offer POS tags and eliminate obvious errs.
Add "text_norm" to indexer to review common false-pos still appearing.
Addressed in part by NonSenseFilter -- removing lowercase matches.
Seems more like gazetteer ETL fixes than a pattern generalization. If such trivial gazetteer entries should never be tagged, then we mark them search_only=1