graph-explorer
graph-explorer copied to clipboard
[Bug] Gremlin Connector Timeout when Fetching Vertex Schema upon Database Synchronizing
Community Note
- Please use a 👍 reaction to provide a +1/vote. This helps the community and maintainers prioritize this request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Describe the bug When synchronizing Graph Explorer UI using the Gremlin Connector, the synchronization fails due to a timeout. The timeout occurs when executing the code that fetches vertices attributes (fetchVerticesAttributes) - it makes an HTTP call to the graph-explorer's proxy URL.
- Deployment of Graph Explorer: via SageMaker
- Browser: Google Chrome
- Graph Explorer Version: 1.4.0
- Graph Database & Version: Amazon Neptune 1.3.0.0
- Graph Connector
GP-Gremlin
as default connection
To Reproduce Steps to reproduce the behavior:
- Go to Amazon Neptune Console
- Click on
Notebooks
(https://us-west-2.console.aws.amazon.com/neptune/home?region=us-west-2#notebooks:
) - Select radio button for relevant notebook
- Click on
Actions
button - Select
Open Graph Explorer
- Then, in graph-explorer UI, click on
GP-Gremlin
default connection - Click on
Synchronize Database
icon (top-right of UI view)
See screen capture of the graph-explorer UI, along with log showing network activity upon step 7 above. Note the timeout after 2 minutes for the HTTP request.
Expected behavior
The expected behavior upon clicking the Synchronize Database
icon would be to receive a success notification after a few seconds (i.e. something less than 10 seconds).
Some Additional Notes The gremlin query being executed via the above mentioned proxy HTTP call is generated by the verticesSchemaTemplate function.
In our case, the query produced by verticesSchemaTemplate
function is as follows (the label names have been changed in this example):
g.V().project("VertexType1","VertexType2","VertexType3","VertexType4","VertexType5","VertexType6","VertexType7","VertexType8").by(V().hasLabel("VertexType1").limit(1)).by(V().hasLabel("VertexType2").limit(1)).by(V().hasLabel("VertexType3").limit(1)).by(V().hasLabel("VertexType4").limit(1)).by(V().hasLabel("VertexType5").limit(1)).by(V().hasLabel("VertexType6").limit(1)).by(V().hasLabel("VertexType7").limit(1)).by(V().hasLabel("VertexType8").limit(1)).limit(1)
Upon further investigation, it was found that this query works for databases smaller than the one we currently have deployed. As a mitigation to the size of the database we ran the query with an extended timeout and it completed with success, but it took over 8 minutes to complete. The default timeout is 2 minutes, and hence the Sychronization fails in the Graph Explorer UI.
A proposed query (that should return an equivalent result), completes successfully in under 1 second for our graph database:
g.V().union(
__.hasLabel('VertexType1').limit(1),
__.hasLabel('VertexType2').limit(1),
__.hasLabel('VertexType3').limit(1),
__.hasLabel('VertexType4').limit(1),
__.hasLabel('VertexType5').limit(1),
__.hasLabel('VertexType6').limit(1),
__.hasLabel('VertexType7').limit(1),
__.hasLabel('VertexType8').limit(1)
)
.fold()
.project('VertexType1', 'VertexType2', 'VertexType3', 'VertexType4', 'VertexType5', 'VertexType6', 'VertexType7', 'VertexType8')
.by(unfold().hasLabel('VertexType1'))
.by(unfold().hasLabel('VertexType2'))
.by(unfold().hasLabel('VertexType3'))
.by(unfold().hasLabel('VertexType4'))
.by(unfold().hasLabel('VertexType5'))
.by(unfold().hasLabel('VertexType6'))
.by(unfold().hasLabel('VertexType7'))
.by(unfold().hasLabel('VertexType8'))
Explanation of the proposed query above:
- union the results for each vertex label (each result from an anonymous query with a limit of 1)
- fold the results into a single value
- project each label
- provide the projection with the label's value.
See attached files for more details about our graph and the execution of the incumbent query and proposed query:
Some Cluster Status Info (see details cluster_status.json):
- DB Engine Version: 1.3.0.0.R1
- Gremlin Version: tinkerpop-3.6.4
Graph Summary (see details graph_summary.json):
- Nodes:
- Number of nodes: 584713969
- Number of node labels: 8
- Number of node properties: 18
- Edges:
- Number of edges: 762486650
- Number of edge labels: 8
- Number of edge properties: 4
Graph Statistics (see details graph_statistics.json)
- Signature Count: 94
- Instance Count: 1347200742
- Predicate Count: 31
The Explain
and Profile
of the currently used query (verticesSchemaTemplate)
ge-query-explain.txt
ge-query-profile.txt
The Explain
and Profile
of the proposed query, above:
ge-query-modified-explain.txt
ge-query-modified-profile.txt
[Bug] Unable to Synchronize Graph #219
Synchronization with high number of labels with long strings #206
Please consider PR to fix issue: #226