graph-notebook
graph-notebook copied to clipboard
Use properties for group-by for rdf in sparql magic
When using sparql magic to access neptune rdf, I was not able to use group-by option to group by information. The only grouping I was able to apply was by specifying "value", which, in this context was the IRI value, not a meaningful property value. None of the rdf based (i.e. not gremlin) examples I could find showed group-by working. This matched what I understood from looking at the source code (graph_magic.py, SPARQLNetwork.py).
I successfully added an 'elif' block to SPARQLNetwork.py parse_node, permitting a lookup in the data 'properties' key when P.<prop_name>
is used in the group-by option setting. This allows the user to add literals and use them to group nodes for visualizations. I could supply a pull request but have not yet been successful doing a pip install from clean source outside of notebook lifecycle, so my pull request would be against technically untested code. As a workaround I have been modifying in site-packages directly. I provide a diff below so it's clear I'm doing that works.
diff --git a/src/graph_notebook/network/sparql/SPARQLNetwork.py b/src/graph_notebook/network/sparql/SPARQLNetwork.py
index 374ba9a..abd4fd5 100644
--- a/src/graph_notebook/network/sparql/SPARQLNetwork.py
+++ b/src/graph_notebook/network/sparql/SPARQLNetwork.py
@@ -185,6 +185,12 @@ class SPARQLNetwork(EventfulNetwork):
data['group'] = node_binding["type"]
elif self.group_by_property in node_binding:
data['group'] = node_binding[self.group_by_property]
+ elif type(self.group_by_property) is str and self.group_by_property[:2] == "P.":
+ real_prop = self.group_by_property[2:]
+ if 'properties' in data and real_prop in data['properties']:
+ data['group'] = str(data['properties'][real_prop])
+ else:
+ data['group'] = "default"
else:
data['group'] = node_binding["type"]
If there's a way to get grouping to work with sparql that I'm missing that would be great, but the absence of non-gremlin examples, nature of data, and my review of the code leads me to think this is not possible.
I am working in a jupyter python 3 (running python 3.7) notebook, querying neptune rdf with sparql magic, using the included graph visualization. pip shows graph-notebook version 3.3.0. I can run the jupyter lifecycle install.sh, which installs graph-notebook. I then modify the code in site-packages, remove the .pyc, kill the notebook and open it up again. I hope to get this working from the sagemaker notebook lifecycle correctly as time permits, at which point I could do a fork and pull my version while awaiting a real pull approval.
Hi @louvasquez, thanks for bringing this gap in functionality to our attention.
As observed from the code, we are currently only able to group-by the values of properties available at the top level of each node binding (usually rdf:type
, rdf:value
). This was based off of our previous implementations of group-by for Gremlin and openCypher data, where Label and ID metadata are typically bundled all together in the same map representation of a returned node. Looking at your provided example, I now see that Sparql returns RDF data a little differently, where the literals are stored under their own "properties" key and contained as a single map at the top level of the node binding.
Thank you for providing an initial example of a workaround, we can use this as a base for the fix and include it in our next release. We'll also make the literal properties accessible to a couple other Sparql vis customization options (--display-property
, --tooltip_property
), and also to the group-by-label implementation.
That's awesome. Thanks for getting back to me on this. If I get to it before you do, I'll try my hand at it and do a pull request for all 3 options. If I do, feel free to decline the merge and implement your best way, just want to help where I can.