graph-notebook
graph-notebook copied to clipboard
[BUG] Graph visualization does not support multivalue properties
Graph visualization does not support multivalue properties
Steps to reproduce the behavior:
- Set up a graph where vertices have multivalue properties (i.e. have set cardinality)
- query
g.V().outE().inV().path().by(elementMap())
. This works but you cannot see all values of the multivalue properties. - If you change the query to use valueMap
g.V().outE().inV().path().by(valueMap())
, the visualization does not render properly. Some vertices are drawn but they do not represent the graph.
Expected behavior Graph is visualized correctly even when valueMap() is used and multivalue properties can be viewed in the visualization "Details" box
Screenshots
This is how the graph looks when using elementMap()
This is how the graph looks when I use valueMap()
Desktop (please complete the following information):
- OS: macOS 12.6
- Browser: Chrome 105.0.5195.125
- Version: graph-notebook 3.6.0
Hi @FsecureSamiTikka, thank you for the bug report!
For my debugging purposes, could you also share the path data from both of the example queries?
Working query
%%gremlin
g.with("evaluationTimeout", 60000)
.V("Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762")
.repeat(outE().inV()).times(4).emit().dedup().path().by(elementMap())
returns
path[{<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url', 'hostname': 'khhhya2jh2jha45bh.test', 'scheme': 'https', 'path': '/jadaeghab3762', 'url': 'hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', 'tlp_class': 20, 'url_last_seen': '2022-09-22T12:50:48.074432Z', 'url_time': '2022-09-22T12:50:48.074432Z', 'url_categories': 'caa'}, {<T.id: 1>: 'hostedAt::Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762::Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'hostedAt', <Direction.IN: 2>: {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host'}, <Direction.OUT: 3>: {<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url'}}, {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host', 'hostname': 'khhhya2jh2jha45bh.test', 'subdomain': '', 'host_categories': 'cbb', 'host_categories_time': '2022-09-22T12:50:48.018855Z'}]
path[{<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url', 'hostname': 'khhhya2jh2jha45bh.test', 'scheme': 'https', 'path': '/jadaeghab3762', 'url': 'hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', 'tlp_class': 20, 'url_last_seen': '2022-09-22T12:50:48.074432Z', 'url_time': '2022-09-22T12:50:48.074432Z', 'url_categories': 'caa'}, {<T.id: 1>: 'hostedAt::Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762::Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'hostedAt', <Direction.IN: 2>: {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host'}, <Direction.OUT: 3>: {<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url'}}, {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host', 'hostname': 'khhhya2jh2jha45bh.test', 'subdomain': '', 'host_categories': 'cbb', 'host_categories_time': '2022-09-22T12:50:48.018855Z'}, {<T.id: 1>: 'underDomain::Host:khhhya2jh2jha45bh.test::Domain:khhhya2jh2jha45bh.test', <T.label: 4>: 'underDomain', <Direction.IN: 2>: {<T.id: 1>: 'Domain:khhhya2jh2jha45bh.test', <T.label: 4>: 'Domain'}, <Direction.OUT: 3>: {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host'}}, {<T.id: 1>: 'Domain:khhhya2jh2jha45bh.test', <T.label: 4>: 'Domain', 'domain_name': 'khhhya2jh2jha45bh.test', 'domain_wo_suffix': 'khhhya2jh2jha45bh', 'suffix': 'test', 'domain_categories_time': '2022-09-22T12:50:47.954472Z', 'domain_categories': 'ccc'}]
Broken query
%%gremlin
g.with("evaluationTimeout", 60000)
.V("Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762")
.repeat(outE().inV()).times(4).emit().dedup().path().by(valueMap())
returns
path[{'path': ['/jadaeghab3762'], 'hostname': ['khhhya2jh2jha45bh.test'], 'url_categories': ['ca', 'caa'], 'scheme': ['https'], 'tlp_class': [20], 'url': ['hXXps://khhhya2jh2jha45bh.test/jadaeghab3762'], 'url_time': ['2022-09-22T12:50:48.074432Z'], 'url_last_seen': ['2022-09-22T12:50:48.074432Z']}, {}, {'hostname': ['khhhya2jh2jha45bh.test'], 'host_categories_time': ['2022-09-22T12:50:48.018855Z'], 'subdomain': [''], 'host_categories': ['cb', 'cbb']}]
path[{'path': ['/jadaeghab3762'], 'hostname': ['khhhya2jh2jha45bh.test'], 'url_categories': ['ca', 'caa'], 'scheme': ['https'], 'tlp_class': [20], 'url': ['hXXps://khhhya2jh2jha45bh.test/jadaeghab3762'], 'url_time': ['2022-09-22T12:50:48.074432Z'], 'url_last_seen': ['2022-09-22T12:50:48.074432Z']}, {}, {'hostname': ['khhhya2jh2jha45bh.test'], 'host_categories_time': ['2022-09-22T12:50:48.018855Z'], 'subdomain': [''], 'host_categories': ['cb', 'cbb']}, {}, {'domain_name': ['khhhya2jh2jha45bh.test'], 'domain_categories': ['cc', 'ccc'], 'domain_wo_suffix': ['khhhya2jh2jha45bh'], 'suffix': ['test'], 'domain_categories_time': ['2022-09-22T12:50:47.954472Z']}]
I only now see the path().by(valueMap()) returned no data for the edges. So maybe this is a Neptune bug after all.
Looking at the paths provided, Neptune seems to be returning the correct results. The discrepancies in edge data returned are due to functional differences between the valueMap()
and elementMap()
steps.
By default, elementMap()
will return the id
and label
properties in the node/edge mapping, as well as IN
/OUT
directional properties for edges, along with any user-provided properties.
On the other hand, valueMap()
will only return user-provided properties. If none are present, then an empty map will be returned for the element. Specifying valueMap(true)
will insert these metadata properties into the node/edge map but may also require a few extra steps.
@krlawrence has also written a excellent rundown and examples for this topic: https://www.kelvinlawrence.net/book/PracticalGremlin.html#element-map
Going back to the original issue -
- query g.V().outE().inV().path().by(elementMap()). This works but you cannot see all values of the multivalue properties.
Based on the path data from the query using elementMap, the node and edge properties should be all be present in the details view, for example:
Is the question here is about how to display all these properties as the visual label on node (i.e. a concatenated list of all the property values, as shown from the valueMap
query)? If so, the concatenated list label is only used as a fallback in cases where the usual default, the label
property of the node/edge, is not present in the data. There is no other way to manually specify this display label option, and elementMap
always returns the label
as part of the data, so it is not possible when using this step. The only way to show the concatenated labels is via the valueMap()
step with no parameters.
- If you change the query to use valueMap g.V().outE().inV().path().by(valueMap()), the visualization does not render properly. Some vertices are drawn but they do not represent the graph.
For most queries returning path data via valueMap
, this is is the expected result. There isn't a way to 100% distinguish node and edge maps(unless you have an elementMap
step that returns directional properties with the edges), so the visualizer initially assumes that any generic path element is a node, and draws blank edges between them.
To control how individual path steps are drawn, we can specify a pattern to use via the -p
/--path-pattern
option. For the query listed, our paths follow sequences of V->outE->inV
, so we would specify -p v,oute,inv
.
Here's an example of an equivalent Gremlin query using the air-routes dataset, without the path pattern:
And the same query with the path pattern added, correcting the visualization of the edges:
Please do take a look at our tutorial notebooks, which provide excellent walkthroughs of how to visualize queries.
Closing due to inactivity. Please re-open if you have additional questions.