mordecai3 icon indicating copy to clipboard operation
mordecai3 copied to clipboard

Some entities gets dropped from the result

Open chirag4semandex opened this issue 1 year ago • 1 comments

What is the reason for this check?

https://github.com/ahalterman/mordecai3/blob/main/mordecai3/geoparse.py#L449

Because of this, some places get dropped.

Here is an example from the test_moredecai3 tests

Shouldn't this return 2 geo-locations instead of just one ( Oxford )

"Ole Miss is located in Oxford."

- Ole Miss - University of Mississippi as per Geonames ( missed )
- Oxford

Similarly below should also return 2 geo-locations, right?

"Oxford is home to Oxford University, one of the best universities in the world."

- `Oxford` 
- `Oxford University`  ( missed )

chirag4semandex avatar Oct 08 '24 18:10 chirag4semandex

Hi! Thanks for the question and examples. You are right, both look like they should return two location each. Just adding some notes and context for now.

The check in question at the time of this comment (the lines have moved since then):

https://github.com/ahalterman/mordecai3/blob/366591fd793edf46106ddfd74a9c303eb0c740b1/mordecai3/geoparse.py#L449-L451

Maybe this impacts the first example, where "Ole Miss" is picked up as a search name---spacy tags it as GPE.

In the 2nd example, spacy tags "Oxford University" as a ORG, so it never gets picked up as a search name in the first place.

andybega avatar Sep 19 '25 11:09 andybega