graph-data-science
graph-data-science copied to clipboard
Pipeline operations fail with "Expected positive long value, got -24"
Describe the bug
Link prediction operations (e.g., .create, .addNodeProperty) fail, using GDS 2.5.1 and 2.5.3.
To Reproduce A. Execute either of these using the Python GDS client:
pipe = gds.lp_pipe("foo"), orgds.run_cypher("""CALL gds.beta.pipeline.linkPrediction.create("foo")""")
B. Execute this in the Neo4j Browser:
CALL gds.beta.pipeline.linkPrediction.create("foo")
GDS version: 2.5.1, also 2.5.3 Neo4j version: standalone EE 5.13.0 Operating system: Amazon Linux (AMI ID: ami-0cd7323ab3e63805f)
Steps to reproduce the behavior (when using the Python GDS client):
from graphdatascience import GraphDataScience
gds = GraphDataScience(MY_URI, auth=MY_AUTH, database=MY_DB)
# Append here one of the above statements to create an LP pipeline.
Expected behavior The pipeline is created AND results are returned.
Observed behavior
The call fails with Expected positive long value, got -24 error message. But it appears the pipeline is actually created.
Here is error, when executing pipe = gds.lp_pipe("foo")
---------------------------------------------------------------------------
DatabaseError Traceback (most recent call last)
Cell In[2], line 1
----> 1 pipe = gds.lp_pipe("foo")
File ~/pyenv/lib64/python3.9/site-packages/graphdatascience/pipeline/pipeline_endpoints.py:26, in PipelineEndpoints.lp_pipe(self, name)
16 """
17 Create a Link Prediction training pipeline, with all default settings.
18
(...)
23 A new instance of a Link Prediction pipeline object.
24 """
25 runner = PipelineBetaProcRunner(self._query_runner, f"{self._namespace}.beta.pipeline", self._server_version)
---> 26 p, _ = runner.linkPrediction.create(name)
27 return p
File ~/pyenv/lib64/python3.9/site-packages/graphdatascience/pipeline/lp_pipeline_create_runner.py:16, in LPPipelineCreateRunner.create(self, name)
14 query = f"CALL {self._namespace}($name)"
15 params = {"name": name}
---> 16 result = self._query_runner.run_query(query, params).squeeze()
18 return LPTrainingPipeline(name, self._query_runner, self._server_version), result
File ~/pyenv/lib64/python3.9/site-packages/graphdatascience/query_runner/neo4j_query_runner.py:70, in Neo4jQueryRunner.run_query(self, query, params, database, custom_error)
63 # Though pandas support may be experimental in the `neo4j` package, it should always
64 # be supported in the `graphdatascience` package.
65 warnings.filterwarnings(
66 "ignore",
67 message=r"^pandas support is experimental and might be changed or removed in future versions$",
68 )
---> 70 df = result.to_df()
72 if self._NEO4J_DRIVER_VERSION < ServerVersion(5, 0, 0):
73 self._last_bookmarks = [session.last_bookmark()]
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/work/result.py:748, in Result.to_df(self, expand, parse_dates)
745 import pandas as pd # type: ignore[import]
747 if not expand:
--> 748 df = pd.DataFrame(self.values(), columns=self._keys)
749 else:
750 df_keys = None
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/work/result.py:603, in Result.values(self, *keys)
585 def values(
586 self, *keys: _TResultKey
587 ) -> t.List[t.List[t.Any]]:
588 """Return the remainder of the result as a list of values lists.
589
590 :param keys: fields to return for each remaining record. Optionally filtering to include only certain values by index or key.
(...)
601 .. seealso:: :meth:`.Record.values`
602 """
--> 603 return [record.values(*keys) for record in self]
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/work/result.py:603, in <listcomp>(.0)
585 def values(
586 self, *keys: _TResultKey
587 ) -> t.List[t.List[t.Any]]:
588 """Return the remainder of the result as a list of values lists.
589
590 :param keys: fields to return for each remaining record. Optionally filtering to include only certain values by index or key.
(...)
601 .. seealso:: :meth:`.Record.values`
602 """
--> 603 return [record.values(*keys) for record in self]
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/work/result.py:266, in Result.__iter__(self)
264 yield self._record_buffer.popleft()
265 elif self._streaming:
--> 266 self._connection.fetch_message()
267 elif self._discarding:
268 self._discard()
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/io/_common.py:180, in ConnectionErrorHandler.__getattr__.<locals>.outer.<locals>.inner(*args, **kwargs)
178 def inner(*args, **kwargs):
179 try:
--> 180 func(*args, **kwargs)
181 except (Neo4jError, ServiceUnavailable, SessionExpired) as exc:
182 assert not asyncio.iscoroutinefunction(self.__on_error)
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/io/_bolt.py:851, in Bolt.fetch_message(self)
847 # Receive exactly one message
848 tag, fields = self.inbox.pop(
849 hydration_hooks=self.responses[0].hydration_hooks
850 )
--> 851 res = self._process_message(tag, fields)
852 self.idle_since = perf_counter()
853 return res
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/io/_bolt5.py:376, in Bolt5x0._process_message(self, tag, fields)
374 self._server_state_manager.state = self.bolt_states.FAILED
375 try:
--> 376 response.on_failure(summary_metadata or {})
377 except (ServiceUnavailable, DatabaseUnavailable):
378 if self.pool:
File ~/pyenv/lib64/python3.9/site-packages/neo4j/_sync/io/_common.py:247, in Response.on_failure(self, metadata)
245 handler = self.handlers.get("on_summary")
246 Util.callback(handler)
--> 247 raise Neo4jError.hydrate(**metadata)
DatabaseError: {code: Neo.DatabaseError.Statement.ExecutionFailed} {message: Expected positive long value, got -24}
Additional context
- This issue did not exist with GDS 2.5.0 and Neo4j 5.11.0.
- Since the link prediction pipeline is created by
.createeven when it throws the error, I tried to use the Cypher GDS API on the Browser to call.addNodePropertyon the pipeline, but that also gave me the "Expected positive long value, got -24" error.
Seems like a related bug was fixed in Neo4j 4.4.10 and 5.0.0:
Fix overflow in resource manager. Users could get errors like
java.lang.IllegalArgumentException: Expected positive long value, got -8589934576because of an overflow when trying to grow the number of tracked resources.
We tried to reproduce this and pipeline creation worked with gds 2.5.3 and Neo4j 5.13. It worked all right creating a linkprediction pipeline on a fresh db. It is surprising this caused a bug as its a basic operation that we have test for. Are you able to run other gds algorithms such as for example pageRank?
Could you attach the neo4j logs including debug log?
I have attached the debug and neo4j logs.
I also tried the following, but still got the same error:
- Using a fresh database (in same DBMS instance).
- Removing all plugins except for
bloom-plugin-5.x-2.10.0.jarandneo4j-graph-data-science-2.5.3.jar. - Running
CALL gds.beta.pipeline.nodeClassification.create("pipe-nc")in the Browser.
However, all the PageRank algorithm examples in the docs succeeded and produced the expected results.
In case the OS and its environment is related to this issue: I am using a standalone EE instance running on AWS EC2 with an Amazon Linux OS (AMI ID: ami-0cd7323ab3e63805f).
I have created another AWS EC2 instance running neo4j 5.11.0 and GDS 2.5.1, in which both CALL gds.beta.pipeline.linkPrediction.create("foo") and gds.lp_pipe("foo") succeed.
The same with Neo4j 5.11.0 and GDS 2.5.3.
However, they fail (in the same way as before) with Neo4j 5.12.0 and GDS 2.5.0. So this seems to imply that the root issue is actually in Neo4j 5.12.0 onwards.
Thanks for the info @cybersam ! What JVM are you using?
...$ java -version openjdk version "17.0.9" 2023-10-17 LTS OpenJDK Runtime Environment Corretto-17.0.9.8.1 (build 17.0.9+8-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.9.8.1 (build 17.0.9+8-LTS, mixed mode, sharing)
[ec2-user@ip-172-31-13-182 ~]$ java -version openjdk version "17.0.9" 2023-10-17 LTS OpenJDK Runtime Environment Corretto-17.0.9.8.1 (build 17.0.9+8-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.9.8.1 (build 17.0.9+8-LTS, mixed mode, sharing)
Great, thank you! We're looking into this
Any updates or recommendations for this issue? Ideally not involving downgrading the Neo4j version. Thank you
@cybersam @lukepereira
The issue was with the Cypher runtime but should be fixed as of Neo4j DB version 5.16. Would you be able to try your code with this version?