graph-notebook icon indicating copy to clipboard operation
graph-notebook copied to clipboard

Use properties for group-by for rdf in sparql magic

Open louvasquez opened this issue 2 years ago • 2 comments

When using sparql magic to access neptune rdf, I was not able to use group-by option to group by information. The only grouping I was able to apply was by specifying "value", which, in this context was the IRI value, not a meaningful property value. None of the rdf based (i.e. not gremlin) examples I could find showed group-by working. This matched what I understood from looking at the source code (graph_magic.py, SPARQLNetwork.py).

I successfully added an 'elif' block to SPARQLNetwork.py parse_node, permitting a lookup in the data 'properties' key when P.<prop_name> is used in the group-by option setting. This allows the user to add literals and use them to group nodes for visualizations. I could supply a pull request but have not yet been successful doing a pip install from clean source outside of notebook lifecycle, so my pull request would be against technically untested code. As a workaround I have been modifying in site-packages directly. I provide a diff below so it's clear I'm doing that works.

diff --git a/src/graph_notebook/network/sparql/SPARQLNetwork.py b/src/graph_notebook/network/sparql/SPARQLNetwork.py
index 374ba9a..abd4fd5 100644
--- a/src/graph_notebook/network/sparql/SPARQLNetwork.py
+++ b/src/graph_notebook/network/sparql/SPARQLNetwork.py
@@ -185,6 +185,12 @@ class SPARQLNetwork(EventfulNetwork):
                     data['group'] = node_binding["type"]
             elif self.group_by_property in node_binding:
                 data['group'] = node_binding[self.group_by_property]
+            elif type(self.group_by_property) is str and self.group_by_property[:2] == "P.":
+                real_prop = self.group_by_property[2:]
+                if 'properties' in data and real_prop in data['properties']:
+                        data['group'] = str(data['properties'][real_prop])
+                else:
+                    data['group'] = "default"
             else:
                 data['group'] = node_binding["type"]

If there's a way to get grouping to work with sparql that I'm missing that would be great, but the absence of non-gremlin examples, nature of data, and my review of the code leads me to think this is not possible.

I am working in a jupyter python 3 (running python 3.7) notebook, querying neptune rdf with sparql magic, using the included graph visualization. pip shows graph-notebook version 3.3.0. I can run the jupyter lifecycle install.sh, which installs graph-notebook. I then modify the code in site-packages, remove the .pyc, kill the notebook and open it up again. I hope to get this working from the sagemaker notebook lifecycle correctly as time permits, at which point I could do a fork and pull my version while awaiting a real pull approval.

louvasquez avatar Apr 25 '22 15:04 louvasquez

Hi @louvasquez, thanks for bringing this gap in functionality to our attention.

As observed from the code, we are currently only able to group-by the values of properties available at the top level of each node binding (usually rdf:type, rdf:value). This was based off of our previous implementations of group-by for Gremlin and openCypher data, where Label and ID metadata are typically bundled all together in the same map representation of a returned node. Looking at your provided example, I now see that Sparql returns RDF data a little differently, where the literals are stored under their own "properties" key and contained as a single map at the top level of the node binding.

Thank you for providing an initial example of a workaround, we can use this as a base for the fix and include it in our next release. We'll also make the literal properties accessible to a couple other Sparql vis customization options (--display-property, --tooltip_property), and also to the group-by-label implementation.

michaelnchin avatar Apr 26 '22 02:04 michaelnchin

That's awesome. Thanks for getting back to me on this. If I get to it before you do, I'll try my hand at it and do a pull request for all 3 options. If I do, feel free to decline the merge and implement your best way, just want to help where I can.

louvasquez avatar Apr 26 '22 14:04 louvasquez