Some entities gets dropped from the result
What is the reason for this check?
https://github.com/ahalterman/mordecai3/blob/main/mordecai3/geoparse.py#L449
Because of this, some places get dropped.
Here is an example from the test_moredecai3 tests
Shouldn't this return 2 geo-locations instead of just one ( Oxford )
"Ole Miss is located in Oxford."
- Ole Miss - University of Mississippi as per Geonames ( missed )
- Oxford
Similarly below should also return 2 geo-locations, right?
"Oxford is home to Oxford University, one of the best universities in the world."
- `Oxford`
- `Oxford University` ( missed )
Hi! Thanks for the question and examples. You are right, both look like they should return two location each. Just adding some notes and context for now.
The check in question at the time of this comment (the lines have moved since then):
https://github.com/ahalterman/mordecai3/blob/366591fd793edf46106ddfd74a9c303eb0c740b1/mordecai3/geoparse.py#L449-L451
Maybe this impacts the first example, where "Ole Miss" is picked up as a search name---spacy tags it as GPE.
In the 2nd example, spacy tags "Oxford University" as a ORG, so it never gets picked up as a search name in the first place.