openrefine-wikibase icon indicating copy to clipboard operation
openrefine-wikibase copied to clipboard

Reconcile by coordinates

Open wetneb opened this issue 4 years ago • 5 comments

Posted by @VojtechDostal at https://github.com/OpenRefine/OpenRefine/issues/3663:

Reconciliation by string matching is useful in many cases, but it is currently (to my knowledge) impossible to find closest items to the matched object. Proposed solution

Use case: I have a list of buildings with coordinates (lat,lon). I'd like to find what the closest item(s) to those coordinates are. Additionally I'd like to be able to filter out results by class (subclass of: building) and suggest only these. High-confidence matches (very close and corresponding names) could be auto-matched. Alternatives considered

I don't know of any alternative way/hack to load the closest item to given coordinates. However, the Wikidata SPARQL service has a distance service and I think there is also a special API call for exactly this.

wetneb avatar Mar 03 '21 10:03 wetneb

FWIW, if you don't mind running your own reconciliation service, I've just written a geo scoring plugin for csv-reconcile.

With this you could, say run a SPARQL query to find coordinate locations of points you're looking to match against, export that as a TSV file and use that to run csv-reconcile.

You can get the service up and running as simply as the following:

$ python -m venv serverenv
$ source serverenv/bin/activate
$ python -m pip install csv-reconcile
$ python -m pip install csv-reconcile-geo
$ csv-reconcile --init-db query.tsv item coord --scorer geo 

Here item is the name of the column containing the QID's and coord is the name of the coordinate column in well-known text format, the default export format for coordinates.

This was just my first pass at it. There's certainly room for improvement, but it may suit your immediate needs.

gitonthescene avatar Apr 07 '21 00:04 gitonthescene

@gitonthescene Sounds great! I'll give it a shot at the first opportunity

VojtechDostal avatar Apr 07 '21 05:04 VojtechDostal

FWIW, if you don't mind running your own reconciliation service, I've just written a geo scoring plugin for csv-reconcile.

With this you could, say run a SPARQL query to find coordinate locations of points you're looking to match against, export that as a TSV file and use that to run csv-reconcile.

You can get the service up and running as simply as the following:

$ python -m venv serverenv
$ source serverenv/bin/activate
$ python -m pip install csv-reconcile
$ python -m pip install csv-reconcile-geo
$ csv-reconcile --init-db query.tsv item coord --scorer geo 

Here item is the name of the column containing the QID's and coord is the name of the coordinate column in well-known text format, the default export format for coordinates.

This was just my first pass at it. There's certainly room for improvement, but it may suit your immediate needs.

@gitonthescene Please could you assist me with this? I am a bit disoriented and I am not sure if I understand the overall idea of 'my own' reconciliation service correctly. Am I right in assuming that I need to load File number 1 into openrefine, load File number 2 into command line via the commands above, add a reconciliation service "http://127.0.0.1:5000/reconcile" to OpenRefine and reconcile?

I think I was able to start virtualenv on my system (I am on Windows and "source" did not work, but I think I was able to find a solution at https://stackoverflow.com/questions/8921188/issue-with-virtualenv-cannot-activate) and then I was able to install csv-reconcile and csv-reconcile-geo. However, this is what I get when I run the program:

(venv) C:\Users\vojte\Downloads>csv-reconcile --init-db query.tsv item coord --scorer geo
c:\users\vojte\venv\lib\site-packages\normality\__init__.py:72: ICUWarning: Install 'pyicu' for better text transliteration.
  text = ascii_text(text)
Traceback (most recent call last):
  File "C:\Users\vojte\AppData\Local\Programs\Python\Python37-32\Lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\vojte\AppData\Local\Programs\Python\Python37-32\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\vojte\venv\Scripts\csv-reconcile.exe\__main__.py", line 7, in <module>
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\vojte\venv\lib\site-packages\csv_reconcile\__init__.py", line 210, in main
    initdb.init_db()
  File "c:\users\vojte\venv\lib\site-packages\csv_reconcile\initdb.py", line 76, in init_db
    (mid, word) + tuple(matchFields))
sqlite3.IntegrityError: UNIQUE constraint failed: reconcile.id
sqlite3.IntegrityError: UNIQUE constraint failed: reconcile.id

My query.tsv is from https://w.wiki/3BV9

What do you think is happening? Sorry to spam the issue with my questions

VojtechDostal avatar Apr 13 '21 10:04 VojtechDostal

Perhaps this discussion could be moved to the csv-reconcile project? Unrelated discussions might put people off :)

wetneb avatar Apr 13 '21 12:04 wetneb

created as new issue here: https://github.com/gitonthescene/csv-reconcile/issues/3 sorry for this @wetneb :)

VojtechDostal avatar Apr 13 '21 14:04 VojtechDostal