graph-notebook icon indicating copy to clipboard operation
graph-notebook copied to clipboard

[BUG] Graph visualization does not support multivalue properties

Open WithSecureSamiTikka opened this issue 2 years ago • 1 comments

Graph visualization does not support multivalue properties

Steps to reproduce the behavior:

  1. Set up a graph where vertices have multivalue properties (i.e. have set cardinality)
  2. query g.V().outE().inV().path().by(elementMap()). This works but you cannot see all values of the multivalue properties.
  3. If you change the query to use valueMap g.V().outE().inV().path().by(valueMap()), the visualization does not render properly. Some vertices are drawn but they do not represent the graph.

Expected behavior Graph is visualized correctly even when valueMap() is used and multivalue properties can be viewed in the visualization "Details" box

Screenshots This is how the graph looks when using elementMap() WORKS-using-elementMap

This is how the graph looks when I use valueMap() BROKEN-using-valueMap

Desktop (please complete the following information):

  • OS: macOS 12.6
  • Browser: Chrome 105.0.5195.125
  • Version: graph-notebook 3.6.0

WithSecureSamiTikka avatar Sep 21 '22 10:09 WithSecureSamiTikka

Hi @FsecureSamiTikka, thank you for the bug report!

For my debugging purposes, could you also share the path data from both of the example queries?

michaelnchin avatar Sep 21 '22 17:09 michaelnchin

Working query

%%gremlin
g.with("evaluationTimeout", 60000)
.V("Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762")
.repeat(outE().inV()).times(4).emit().dedup().path().by(elementMap())

returns

path[{<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url', 'hostname': 'khhhya2jh2jha45bh.test', 'scheme': 'https', 'path': '/jadaeghab3762', 'url': 'hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', 'tlp_class': 20, 'url_last_seen': '2022-09-22T12:50:48.074432Z', 'url_time': '2022-09-22T12:50:48.074432Z', 'url_categories': 'caa'}, {<T.id: 1>: 'hostedAt::Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762::Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'hostedAt', <Direction.IN: 2>: {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host'}, <Direction.OUT: 3>: {<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url'}}, {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host', 'hostname': 'khhhya2jh2jha45bh.test', 'subdomain': '', 'host_categories': 'cbb', 'host_categories_time': '2022-09-22T12:50:48.018855Z'}]
path[{<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url', 'hostname': 'khhhya2jh2jha45bh.test', 'scheme': 'https', 'path': '/jadaeghab3762', 'url': 'hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', 'tlp_class': 20, 'url_last_seen': '2022-09-22T12:50:48.074432Z', 'url_time': '2022-09-22T12:50:48.074432Z', 'url_categories': 'caa'}, {<T.id: 1>: 'hostedAt::Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762::Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'hostedAt', <Direction.IN: 2>: {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host'}, <Direction.OUT: 3>: {<T.id: 1>: 'Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762', <T.label: 4>: 'Url'}}, {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host', 'hostname': 'khhhya2jh2jha45bh.test', 'subdomain': '', 'host_categories': 'cbb', 'host_categories_time': '2022-09-22T12:50:48.018855Z'}, {<T.id: 1>: 'underDomain::Host:khhhya2jh2jha45bh.test::Domain:khhhya2jh2jha45bh.test', <T.label: 4>: 'underDomain', <Direction.IN: 2>: {<T.id: 1>: 'Domain:khhhya2jh2jha45bh.test', <T.label: 4>: 'Domain'}, <Direction.OUT: 3>: {<T.id: 1>: 'Host:khhhya2jh2jha45bh.test', <T.label: 4>: 'Host'}}, {<T.id: 1>: 'Domain:khhhya2jh2jha45bh.test', <T.label: 4>: 'Domain', 'domain_name': 'khhhya2jh2jha45bh.test', 'domain_wo_suffix': 'khhhya2jh2jha45bh', 'suffix': 'test', 'domain_categories_time': '2022-09-22T12:50:47.954472Z', 'domain_categories': 'ccc'}]

Broken query

%%gremlin
g.with("evaluationTimeout", 60000)
.V("Url:hXXps://khhhya2jh2jha45bh.test/jadaeghab3762")
.repeat(outE().inV()).times(4).emit().dedup().path().by(valueMap())

returns

path[{'path': ['/jadaeghab3762'], 'hostname': ['khhhya2jh2jha45bh.test'], 'url_categories': ['ca', 'caa'], 'scheme': ['https'], 'tlp_class': [20], 'url': ['hXXps://khhhya2jh2jha45bh.test/jadaeghab3762'], 'url_time': ['2022-09-22T12:50:48.074432Z'], 'url_last_seen': ['2022-09-22T12:50:48.074432Z']}, {}, {'hostname': ['khhhya2jh2jha45bh.test'], 'host_categories_time': ['2022-09-22T12:50:48.018855Z'], 'subdomain': [''], 'host_categories': ['cb', 'cbb']}]
path[{'path': ['/jadaeghab3762'], 'hostname': ['khhhya2jh2jha45bh.test'], 'url_categories': ['ca', 'caa'], 'scheme': ['https'], 'tlp_class': [20], 'url': ['hXXps://khhhya2jh2jha45bh.test/jadaeghab3762'], 'url_time': ['2022-09-22T12:50:48.074432Z'], 'url_last_seen': ['2022-09-22T12:50:48.074432Z']}, {}, {'hostname': ['khhhya2jh2jha45bh.test'], 'host_categories_time': ['2022-09-22T12:50:48.018855Z'], 'subdomain': [''], 'host_categories': ['cb', 'cbb']}, {}, {'domain_name': ['khhhya2jh2jha45bh.test'], 'domain_categories': ['cc', 'ccc'], 'domain_wo_suffix': ['khhhya2jh2jha45bh'], 'suffix': ['test'], 'domain_categories_time': ['2022-09-22T12:50:47.954472Z']}]

I only now see the path().by(valueMap()) returned no data for the edges. So maybe this is a Neptune bug after all.

WithSecureSamiTikka avatar Sep 23 '22 06:09 WithSecureSamiTikka

Looking at the paths provided, Neptune seems to be returning the correct results. The discrepancies in edge data returned are due to functional differences between the valueMap() and elementMap() steps.

By default, elementMap() will return the id and label properties in the node/edge mapping, as well as IN/OUT directional properties for edges, along with any user-provided properties.

On the other hand, valueMap() will only return user-provided properties. If none are present, then an empty map will be returned for the element. Specifying valueMap(true) will insert these metadata properties into the node/edge map but may also require a few extra steps.

@krlawrence has also written a excellent rundown and examples for this topic: https://www.kelvinlawrence.net/book/PracticalGremlin.html#element-map

michaelnchin avatar Sep 23 '22 22:09 michaelnchin

Going back to the original issue -

  1. query g.V().outE().inV().path().by(elementMap()). This works but you cannot see all values of the multivalue properties.

Based on the path data from the query using elementMap, the node and edge properties should be all be present in the details view, for example:

Screen Shot 2022-09-23 at 4 40 36 PM

Is the question here is about how to display all these properties as the visual label on node (i.e. a concatenated list of all the property values, as shown from the valueMap query)? If so, the concatenated list label is only used as a fallback in cases where the usual default, the label property of the node/edge, is not present in the data. There is no other way to manually specify this display label option, and elementMap always returns the label as part of the data, so it is not possible when using this step. The only way to show the concatenated labels is via the valueMap() step with no parameters.

michaelnchin avatar Sep 23 '22 23:09 michaelnchin

  1. If you change the query to use valueMap g.V().outE().inV().path().by(valueMap()), the visualization does not render properly. Some vertices are drawn but they do not represent the graph.

For most queries returning path data via valueMap, this is is the expected result. There isn't a way to 100% distinguish node and edge maps(unless you have an elementMap step that returns directional properties with the edges), so the visualizer initially assumes that any generic path element is a node, and draws blank edges between them.

To control how individual path steps are drawn, we can specify a pattern to use via the -p/--path-pattern option. For the query listed, our paths follow sequences of V->outE->inV, so we would specify -p v,oute,inv.

Here's an example of an equivalent Gremlin query using the air-routes dataset, without the path pattern:

Screen Shot 2022-09-23 at 5 23 13 PM

And the same query with the path pattern added, correcting the visualization of the edges:

Screen Shot 2022-09-23 at 5 24 47 PM

Please do take a look at our tutorial notebooks, which provide excellent walkthroughs of how to visualize queries.

michaelnchin avatar Sep 24 '22 00:09 michaelnchin

Closing due to inactivity. Please re-open if you have additional questions.

michaelnchin avatar Jan 03 '23 22:01 michaelnchin