sparqlwrapper convert query results into pandas dataframe

I am needing to convert the results from a SPARQL query into a pandas data frame. Here is the function I have written for this:

import pandas as pds
import itertools

def make_sparql_df(results):
    df = pds.DataFrame() # create empty dataframe

    for var in itertools.chain(results.variables):
        # create a list of values
        # getValues returns a list of 'Value' objects
        # e.g., [Value(literal:'x'), Value(x:'y'), ...]
        temp = [val.value
                for val in results.getValues(var)]

        # convert temp into Series, this is needed in case
        # temp is an empty list, using a Series will fill
        # column with NaN
        df[var] = pds.Series(temp)

    return df

Is there a more efficient way to do this?

Oct 14 '18 02:10 wdduncan

There was bug in my code above. Below is an updated version for those who are interested:

from SPARQLWrapper import SPARQLWrapper, SPARQLWrapper2, JSON, JSONLD, CSV, TSV, N3, RDF, RDFXML, TURTLE
import pandas as pds
import itertools

def get_sparql_variables(results, sparql_wrapper="SPARQLWrapper2"):
    return results.variables if ("sparqlwrapper2" == sparql_wrapper.lower()) else results['head']['vars']
   

def get_sparql_bindings(results, sparql_wrapper="SPARQLWrapper2"):
    return results.bindings \
              if ("sparqlwrapper2" == sparql_wrapper.lower()) else results['results']['bindings']
   
   
def get_sparql_binding_variable_value(binding, variable, sparql_wrapper="SPARQLWrapper2"):
    return binding[variable].value \
             if ("sparqlwrapper2" == sparql_wrapper.lower()) else binding[variable]['value']
   

def make_sparql_dict_list(bindings, variables, sparql_wrapper="SPARQLWrapper2"):
    def binding_value(binding, var): # helper function for returning values
        return \
            get_sparql_binding_variable_value(binding, var, sparql_wrapper) if (var in binding) else None

    dict_list = []  # list to contain dictionaries
    for binding in itertools.chain(bindings):
        values = [binding_value(binding, var) for var in itertools.chain(variables)]
        dict_list.append(dict(zip(variables, values)))

    return dict_list


def make_sparql_df(results, sparql_wrapper="SPARQLWrapper2"):
    variables = get_sparql_variables(results, sparql_wrapper)
    bindings = get_sparql_bindings(results, sparql_wrapper)

    # create a list of dictionaries to use as data for dataframe
    data_list = make_sparql_dict_list(bindings, variables, sparql_wrapper)

    df = pds.DataFrame(data_list) # create dataframe from data list
    return df[variables] # return dataframe with columns reordered

Nov 06 '18 03:11 wdduncan

Hi @wdduncan

Let me share some pointers to developments around pandas and SPARQLWrapper, that maybe could be interesting for you:

https://github.com/lawlesst/sparql-dataframe https://lawlesst.github.io/notebook/sparql-dataframe.html https://github.com/SuLab/sparql_to_pandas/blob/master/SPARQL_pandas.ipynb

https://github.com/RDFLib/sparqlwrapper/pull/112

Apr 18 '19 10:04 dayures

@dayures Thanks for these pointers! I will make use these!

Apr 19 '19 15:04 wdduncan

@dayures I was looking at this notebook: https://github.com/SuLab/sparql_to_pandas/blob/master/SPARQL_pandas.ipynb

A straightforward approach for putting the SPARQL results into a data frame is:

df = pds.DataFrame(result['results']['bindings'])
df.applymap(lambda x: x['value'])

Apr 19 '19 17:04 wdduncan

@wdduncan what type is result in your answer? I am looking to convert an rdflib.plugins.sparql.processor.SPARQLResult object to a pandas DataFrame in rdflib version 5.0.0.

Here are some things I've tried that work for different use cases:

import pandas as pd

# renders properly in notebooks, but DataFrame values are rdflib objects rather than python primatives
pd.DataFrame(results.bindings)

# converts everything to strings including missing values
pd.DataFrame(results.bindings).applymap(str).rename(columns=str)

# serialize with json and then parse
import json
results_json = results.serialize(format="json")
bindings = json.loads(results_json)["results"]["bindings"]
bindings = [{k: v["value"] for k, v in result.items()} for result in bindings]
pd.DataFrame(bindings)

It seems like consuming results from SPARQL queries into Python workflow with native Python types is a common operation for rdflib. Am I missing something in terms of an easier way to do this?

Also see this implementation in gastrodon.

Sep 28 '20 20:09 dhimmel

@dhimmel Good question! I didn't consider that :)
I've been handling data type coercion on the client slide. Do the results from SPARQLWrapper include data type information? I forget ... You may be able to let Pandas infer the data types. I haven't tried this.

As for RDFLib's native data types, does only work when you use RDFLib's triple store engine?

Sep 28 '20 21:09 wdduncan

My simple solution was so far:

from pandas import DataFrame
from rdflib import Graph, URIRef

graph = Graph()
…
result = graph.query("select * {?s ?p ?o} limit 10")

DataFrame(result, columns=result.vars)

I think this is not technically an issue of the sparqlwrapper but could be some additional PandasWrapper within the RDFLib collection of projects. If someone is willing to maintain it.

Oct 06 '20 08:10 white-gecko

Thanks @white-gecko
That is a simpler solution!

Oct 06 '20 11:10 wdduncan

My simple solution was so far

Noting my reply at https://github.com/RDFLib/rdflib/issues/1179#issuecomment-704261776.

I realized the original issue here was about converting a SPARQLWrapper.Wrapper.QueryResult object to a DataFrame. Whereas my question is about converting a rdflib.plugins.sparql.processor.SPARQLResult to a DataFrame. Sorry about any confusion and let me know if these are actually the same task.

Oct 06 '20 13:10 dhimmel

Note that the @white-gecko solution does not deal with the data types in literals in any way; its not wrong, just something to note. In order to at least have the columns being strings, I had to do something like this:

columns = [str(v) for v in result.vars]
df = pd.DataFrame(result, columns=columns)

But this will not take care of the values in the cells (for which you may want to call .toPython().

Oct 06 '20 13:10 matentzn

(dont use str(v), better use v.toPython())

Oct 06 '20 13:10 matentzn

Thanks @matentzn. .toPython() was the missing piece. Here's a function to convert rdflib.plugins.sparql.processor.SPARQLResult to a DataFrame that converts types:

from pandas import DataFrame
from rdflib.plugins.sparql.processor import SPARQLResult

def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
    """
    Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
    using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
    """
    return DataFrame(
        data=([None if x is None else x.toPython() for x in row] for row in results),
        columns=[str(x) for x in results.vars],
    )

sparql_results_to_df(results)

Another option would be to call .toPython() on the column names, but that prefixes them with ?, which I'm guessing is not what most users want.

Note that the @white-gecko solution does not deal with the data types in literals in any way; its not wrong, just something to note.

Yes, I imagine almost all users will want to convert types when creating a DataFrame. Also noting that method is equivalent (besides column order preservation) to the first method in my comment above, which also explains why its deficient:

# renders properly in notebooks, but DataFrame values are rdflib objects rather than python primatives
pd.DataFrame(results.bindings)

Oct 06 '20 14:10 dhimmel

see https://github.com/WolfgangFahl/pyLoDStorage/issues/25 - just use pyLodStorage and you'd be all set

Aug 16 '21 18:08 WolfgangFahl

How to remove the namespace from the SPARQL result? the encoding method is not working

Apr 09 '23 22:04 Gautamshahi

@Gautamshahi - please give more details ... Did you use pyLodStorage or are you still refering to the open issue in sparqlwrapper - what's your input and what is your query and what did you try? Since you asked a question stackoverflow might be a better place to do that see https://stackoverflow.com/questions/tagged/sparqlwrapper

Apr 10 '23 17:04 WolfgangFahl

sparqlwrapper sparqlwrapper copied to clipboard

convert query results into pandas dataframe

sparqlwrapper
sparqlwrapper copied to clipboard