sparql-dataframe
sparql-dataframe copied to clipboard
Adaptation of sparql_dataframe to Wikidata
Hello,
I am trying to extract dataframes from queries in Wikidata.
For instance, this code from an example in Wikidata works to extract dictionary of countries:
`# pip install sparqlwrapper # https://rdflib.github.io/sparqlwrapper/ import sparql_dataframe import sys from SPARQLWrapper import SPARQLWrapper, JSON
endpoint_url = "https://query.wikidata.org/sparql"
query = """#Countries SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q6256. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }"""
def get_results(endpoint_url, query): user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1]) # TODO adjust user agent; see https://w.wiki/CX6 sparql = SPARQLWrapper(endpoint_url, agent=user_agent) sparql.setQuery(query) sparql.setReturnFormat(JSON) return sparql.query().convert()
results = get_results(endpoint_url, query)
for result in results["results"]["bindings"]: print(result) `
When I do that :
df = sparql_dataframe.get(endpoint_url, query)
I receive this error:
C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py:1315: RuntimeWarning: Format requested was CSV, but XML (application/sparql-results+xml;charset=utf-8) has been returned by the endpoint warnings.warn(message % (requested.upper(), format_name, mime), RuntimeWarning)
AttributeError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\sparql_dataframe\sparql_dataframe.py in get_sparql_dataframe(endpoint, query, post) 28 sparql.setReturnFormat(CSV) 29 results = sparql.query().convert() ---> 30 _csv = StringIO(results.decode('utf-8')) 31 return pd.read_csv(_csv, sep=",")
AttributeError: 'Document' object has no attribute 'decode'
Hello,
Try passing post=True
. E.g.:
sparql_dataframe.get(endpoint_url, query, post=True)
You can see in the unit tests that queries against Wikidata should work fine with post=True
: https://github.com/lawlesst/sparql-dataframe/blob/master/tests/test_sparql_dataframe.py#L65
HTTPError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\sparql_dataframe\sparql_dataframe.py in get_sparql_dataframe(endpoint, query, post) 27 28 sparql.setReturnFormat(CSV) ---> 29 results = sparql.query().convert() 30 _csv = StringIO(results.decode('utf-8')) 31 return pd.read_csv(_csv, sep=",")
C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py in query(self)
1105 :rtype: :class:QueryResult
instance
1106 """
-> 1107 return QueryResult(self._query())
1108
1109 def queryAndConvert(self):
C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py in _query(self) 1085 raise EndPointInternalError(e.read()) 1086 else: -> 1087 raise e 1088 1089 def query(self):
C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py in _query(self) 1071 response = urlopener(request, timeout=self.timeout) 1072 else: -> 1073 response = urlopener(request) 1074 return response, self.returnFormat 1075 except urllib.error.HTTPError as e:
C:\ProgramData\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context) 220 else: 221 opener = _opener --> 222 return opener.open(url, data, timeout) 223 224 def install_opener(opener):
C:\ProgramData\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout) 529 for processor in self.process_response.get(protocol, []): 530 meth = getattr(processor, meth_name) --> 531 response = meth(req, response) 532 533 return response
C:\ProgramData\Anaconda3\lib\urllib\request.py in http_response(self, request, response) 638 # request was successfully received, understood, and accepted. 639 if not (200 <= code < 300): --> 640 response = self.parent.error( 641 'http', request, response, code, msg, hdrs) 642
C:\ProgramData\Anaconda3\lib\urllib\request.py in error(self, proto, *args) 567 if http_err: 568 args = (dict, 'default', 'http_error_default') + orig_args --> 569 return self._call_chain(*args) 570 571 # XXX probably also want an abstract factory that knows when it makes
C:\ProgramData\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args) 500 for handler in handlers: 501 func = getattr(handler, meth_name) --> 502 result = func(*args) 503 if result is not None: 504 return result
C:\ProgramData\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs) 647 class HTTPDefaultErrorHandler(BaseHandler): 648 def http_error_default(self, req, fp, code, msg, hdrs): --> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp) 650 651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 403: Forbidden
I think that's an error returned by the actual Wikidata SPARQL endpoint. It aggressively rate limits.
Hello,
Try passing
post=True
. E.g.:
sparql_dataframe.get(endpoint_url, query, post=True)
You can see in the unit tests that queries against Wikidata should work fine with
post=True
: https://github.com/lawlesst/sparql-dataframe/blob/master/tests/test_sparql_dataframe.py#L65
How would you read the query saved into a separate file? Thanks for your help !
If your queries are saved in a text file, then you would just read them in like any other text file in Python and save them to a query
variable that you would use with sparql_dataframe.get
.
Here's a tutorial on reading and writing files in Python: https://realpython.com/read-write-files-python/#reading-and-writing-opened-files
This works :
import sparql_dataframe endpoint_url = "https://query.wikidata.org/sparql" with open('query.rq', 'r') as file: query = file.read() df = sparql_dataframe.get(endpoint_url, query, post = True) df
Just had the same issue issue querying wikidata. First thought, it might be caused by a version change (SPARQLWrapper was installed in version 2.0.0). It now already contains get_sparql_dataframe
, so the code below was successful.
Nevertheless, thanks for creating this lib which made it directly into the wrapper!
from SPARQLWrapper import get_sparql_dataframe
endpoint = "https://query.wikidata.org/sparql"
query = """#Countries
SELECT ?item ?itemLabel
WHERE {
?item wdt:P31 wd:Q6256.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
"""
df = get_sparql_dataframe(endpoint, query)