openrefine-wikibase
openrefine-wikibase copied to clipboard
504 Server Error: Connection Timed Out for large data set
Hi. I keep getting timeoutes for large data sets, but on what seems like a relatively simple query.
Essentially I'm looking for humans (Q5) by their full name (label), which additionally optionally can have:
- Academic degree (P512).
- Polish scientist ID (P3124).
I'm not sure if this is a bug on wikidata interface side or in OpenRefine or maybe I'm doing something wrong. The data in total have 209 700 rows.
Prior to doing the query I already matched P512 with data existing on Wikidata.
A part of OpenRefine log:
05:06:08.486 [ refine-standard-recon] {"status": "error", "details": "504 Server Error: Connection Timed Out for url: https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q48902896%7CQ58214583%7CQ46325648%7CQ58214579%7CQ83164176%7CQ58214631%7CQ58214613%7CQ57120391%7CQ62565914%7CQ41479223%7CQ51863373%7CQ67085019%7CQ52734983%7CQ38189278%7CQ51767739%7CQ81774233%7CQ58379340%7CQ57663971%7CQ36969826%7CQ38431076%7CQ62565907%7CQ26799961%7CQ40989963%7CQ36880823%7CQ38446475%7CQ58214632%7CQ42758151%7CQ62415447%7CQ84235332%7CQ34573865%7CQ62565923%7CQ53183692%7CQ37682437%7CQ53619567%7CQ49981811%7CQ82752518%7CQ46952292%7CQ38633661%7CQ37615952%7CQ46442382%7CQ62588554%7CQ52747628%7CQ46449071%7CQ58426113%7CQ39595327%7CQ58426115%7CQ58214605%7CQ26746031%7CQ58379352%7CQ41452562&props=aliases%7Clabels%7Cdescriptions%7Cclaims%7Csitelinks", "message": "invalid query", "arguments": {"lang": "pl", "queries": "{\"q0\":{\"query\":\"Chandra Pareek\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":84798},{\"pid\":\"P512\",\"v\":{\"name\":\"profesor zwyczajny\",\"id\":\"Q11827483\"}}],\"type_strict\":\"should\"},\"q1\":{\"query\":\"Ewa Sawicka\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":117644},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q2\":{\"query\":\"Andrzej Kaliszak\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":51621},{\"pid\":\"P512\",\"v\":{\"name\":\"habilitacja\",\"id\":\"Q308678\"}}],\"type_strict\":\"should\"},\"q3\":{\"query\":\"Bogdan Biela\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":96675},{\"pid\":\"P512\",\"v\":{\"name\":\"habilitacja\",\"id\":\"Q308678\"}}],\"type_strict\":\"should\"},\"q4\":{\"query\":\"Hubert Sauermann\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":273075},{\"pid\":\"P512\",\"v\":{\"name\":\"magister\",\"id\":\"Q183816\"}}],\"type_strict\":\"should\"},\"q5\":{\"query\":\"Anna Kaliszewska-Suchodo\u0142a\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":272817},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q6\":{\"query\":\"Iwona K\u0142oszewska\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":44004},{\"pid\":\"P512\",\"v\":{\"name\":\"profesor zwyczajny\",\"id\":\"Q11827483\"}}],\"type_strict\":\"should\"},\"q7\":{\"query\":\"Tadeusz Brukwicki\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":24382},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q8\":{\"query\":\"Iwona Jasser\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":84472},{\"pid\":\"P512\",\"v\":{\"name\":\"habilitacja\",\"id\":\"Q308678\"}}],\"type_strict\":\"should\"},\"q9\":{\"query\":\"Ewa Marzec-Lewenstein\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":26539},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"}}"}} (731930ms)
05:10:49.313 [ refine-standard-recon] {"status": "error", "details": "504 Server Error: Connection Timed Out for url: https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q60446334%7CQ58472343%7CQ58214932%7CQ58214844%7CQ57718112%7CQ35654940%7CQ57718068%7CQ57718065%7CQ59805405%7CQ58214942%7CQ57490441%7CQ57718073%7CQ58472354%7CQ81545319%7CQ50992303%7CQ37786688%7CQ58472348%7CQ57718257%7CQ57718217%7CQ64882166%7CQ58214761%7CQ35209529%7CQ50665639%7CQ58214838%7CQ21185288%7CQ50625541%7CQ57718099%7CQ57718126%7CQ58214870%7CQ41039610%7CQ58214764%7CQ58472331%7CQ58214770%7CQ57718118%7CQ57718077%7CQ58214744%7CQ58214780%7CQ58214913%7CQ58214842%7CQ50992285%7CQ57718093%7CQ58472346%7CQ57718227%7CQ57718155%7CQ58214782%7CQ46822023%7CQ57718161%7CQ39468972%7CQ16562916%7CQ81547415&props=aliases%7Clabels%7Cdescriptions%7Cclaims%7Csitelinks", "message": "invalid query", "arguments": {"lang": "pl", "queries": "{\"q0\":{\"query\":\"Piotr Hawel\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":99790},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q1\":{\"query\":\"Krystyna Ostrowska-Cichocka\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":222221},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q2\":{\"query\":\"Danuta \u0141ugowska\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":108769},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q3\":{\"query\":\"Anna Mazurowska-Domeracka\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":281397},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q4\":{\"query\":\"Izabela Krejtz\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":110489},{\"pid\":\"P512\",\"v\":{\"name\":\"habilitacja\",\"id\":\"Q308678\"}}],\"type_strict\":\"should\"},\"q5\":{\"query\":\"Jerzy Skrobecki\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":130286},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q6\":{\"query\":\"Joanna Brzeszcz\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":224035},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q7\":{\"query\":\"Laura Belowska\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":58714},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q8\":{\"query\":\"Mieczys\u0142aw Plich\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":103389},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"},\"q9\":{\"query\":\"Adam Majka\",\"type\":\"Q5\",\"properties\":[{\"pid\":\"P3124\",\"v\":16210},{\"pid\":\"P512\",\"v\":{\"name\":\"doktor\",\"id\":\"Q849697\"}}],\"type_strict\":\"should\"}}"}} (280827ms)
I'm not even sure if OR is still trying because I don't see anything in the log.
My OR ini (if that is important):
# Memory and max form size allocations
#REFINE_MAX_FORM_CONTENT_SIZE=1048576
REFINE_MEMORY=7000M
# Set initial java heap space (default: 256M) for better performance with large datasets
REFINE_MIN_MEMORY=2400M
This has been running for few hours now and I'm on 4%. Which is not that bad I guess, but there is almost no activity the Windows Task Manager.

You are not doing anything wrong: sadly the public interface is just not as stable as it should be. You could run your own reconciliation service on your machine by installing Docker and running this service in Docker. https://github.com/wetneb/openrefine-wikibase#running-with-docker
Closing since optimizing this service further is out of scope: a MediaWiki extension should reimplement this service instead.