neo4j-python-driver icon indicating copy to clipboard operation
neo4j-python-driver copied to clipboard

verify_connectivity() doesn't always catch errors

Open tomasonjo opened this issue 1 year ago • 4 comments

In the LangChain integration, we use the following block of code:

 try:
      self._driver.verify_connectivity()
except neo4j.exceptions.ServiceUnavailable:
    raise ValueError(
        "Could not connect to Neo4j database. "
        "Please ensure that the url is correct"
    )
except neo4j.exceptions.AuthError:
    raise ValueError(
        "Could not connect to Neo4j database. "
        "Please ensure that the username and password are correct"
    )
# Set schema
try:
    self.refresh_schema() # Does read transactions
except neo4j.exceptions.ClientError:
    raise ValueError(
        "Could not use APOC procedures. "
        "Please ensure the APOC plugin is installed in Neo4j and that "
        "'apoc.meta.data()' is allowed in Neo4j configuration "
    )

Every now and then it happens that the verify_connectivity doesn't catch any error, and the error only appears when we actually do read transactions in the refresh_schema method. This is a bit confusing for the users, but I can't give you a reproducible example when does this happen

tomasonjo avatar Feb 07 '24 13:02 tomasonjo

Here is one example of an error that isn't being caught: ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}

tomasonjo avatar Feb 07 '24 21:02 tomasonjo

Hi and thank you for reaching out!

As a primer: there's a discussion to be had about which exact connectivity this function should verify. Here are some options:

  • Just the TCP connection to the server/cluster
  • To the DBMS itself
  • To the user space (login and such)
  • To a database hosted on the DBMS. In this case: which one? The user's home databse? The always existing system database? Another one?

Currently, it does the latter option: it tries to connect to the user's home database.

Therefore, I'm actually surprised that Neo.ClientError.Database.DatabaseNotFound does not get raised. I couldn't reproduce it either. Using

  • neo4j v5.15 (dockerized)
  • Python v3.11
  • neo4j Python driver from 5.x (think dev) branch, pretty much identical to v5.17.

I spin up a fresh DBMS, run DROP DATABASE neo4j, and then

import asyncio

import neo4j
from neo4j.debug import watch


watch("neo4j")  # driver debug logging

URL = "neo4j://localhost:7687"
AUTH = ("neo4j", "pass")


async def main() -> None:
    async with neo4j.AsyncGraphDatabase.driver(URL, auth=AUTH) as driver:
        await driver.verify_connectivity()


if __name__ == '__main__':
    asyncio.run(main())

The fact that this is async doesn't change a thing regarding the handling of server errors.

And here's the output of that:
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,715  [#0000]  _: <POOL> created, routing address IPv4Address(('localhost', 7687))
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,715  [#0000]  _: <WORKSPACE> resolve home database
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,716  [#0000]  _: <POOL> attempting to update routing table from IPv4Address(('localhost', 7687))
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,716  [#0000]  _: <RESOLVE> in: localhost:7687
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720  [#0000]  _: <RESOLVE> dns resolver out: 127.0.0.1:7687
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720  [#0000]  _: <POOL> _acquire router connection, database=None, address=ResolvedIPv4Address(('127.0.0.1', 7687))
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720  [#0000]  _: <POOL> trying to hand out new connection
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,720  [#0000]  C: <OPEN> 127.0.0.1:7687
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,726  [#D8E0]  C: <MAGIC> 0x6060B017
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,726  [#D8E0]  C: <HANDSHAKE> 0x00040405 0x00020404 0x00000104 0x00000003
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728  [#D8E0]  S: <HANDSHAKE> 0x00000405
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728  [#D8E0]  C: HELLO {'user_agent': 'neo4j-python/5.17.dev0 Python/3.11.6-final-0 (linux)', 'routing': {'address': 'localhost:7687'}, 'bolt_agent': {'product': 'neo4j-python/5.17.dev0', 'platform': 'Linux 5.15.0-94-generic; x86_64', 'language': 'Python/3.11.6-final-0', 'language_details': 'CPython; 3.11.6-final-0 (main, Oct  4 2023 09:29:57) [GCC 11.4.0]'}}
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728  [#D8E0]  _: <CONNECTION> client state: CONNECTED > AUTHENTICATION
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728  [#D8E0]  C: LOGON {'scheme': 'basic', 'principal': 'neo4j', 'credentials': '*******'}
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,728  [#D8E0]  _: <CONNECTION> client state: AUTHENTICATION > READY
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,731  [#D8E0]  S: SUCCESS {'server': 'Neo4j/5.15.0', 'connection_id': 'bolt-34', 'hints': {'connection.recv_timeout_seconds': 120, 'telemetry.enabled': True}}
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,731  [#D8E0]  _: <CONNECTION> server state: CONNECTED > AUTHENTICATION
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,733  [#D8E0]  S: SUCCESS {}
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,733  [#D8E0]  _: <CONNECTION> server state: AUTHENTICATION > READY
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,733  [#D8E0]  C: ROUTE {'address': 'localhost:7687'} () {}
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,735  [#D8E0]  S: FAILURE {'code': 'Neo.ClientError.Database.DatabaseNotFound', 'message': "Unable to get a routing table for database 'neo4j' because this database does not exist"}
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,735  [#D8E0]  C: RESET
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,737  [#D8E0]  S: SUCCESS {}
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,737  [#D8E0]  _: <CONNECTION> server state: FAILED > READY
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,737  [#D8E0]  _: <POOL> released bolt-34
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,738  [#0000]  _: <POOL> close
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,738  [#D8E0]  C: GOODBYE
[DEBUG   ] [Thread 140403322895232] [Task 140403300631952] 2024-02-08 09:11:09,739  [#D8E0]  C: <CLOSE>
Traceback (most recent call last):
  File "/home/rouven/.config/JetBrains/PyCharm2023.2/scratches/scratch_38.py", line 33, in <module>
    asyncio.run(main())
  File "/home/rouven/.pyenv/versions/3.11.6/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/rouven/.pyenv/versions/3.11.6/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rouven/.pyenv/versions/3.11.6/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/rouven/.config/JetBrains/PyCharm2023.2/scratches/scratch_38.py", line 15, in main
    await driver.verify_connectivity()
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/driver.py", line 1008, in verify_connectivity
    await self._get_server_info(session_config)
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/driver.py", line 1219, in _get_server_info
    return await session._get_server_info()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/work/session.py", line 172, in _get_server_info
    await self._connect(READ_ACCESS, liveness_check_timeout=0)
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/work/session.py", line 130, in _connect
    await super()._connect(
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/work/workspace.py", line 161, in _connect
    await self._pool.update_routing_table(
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 779, in update_routing_table
    if await self._update_routing_table_from(
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 725, in _update_routing_table_from
    new_routing_table = await self.fetch_routing_table(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 662, in fetch_routing_table
    new_routing_info = await self.fetch_routing_info(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_pool.py", line 632, in fetch_routing_info
    routing_table = await cx.route(
                    ^^^^^^^^^^^^^^^
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt5.py", line 196, in route
    await self.fetch_all()
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt.py", line 863, in fetch_all
    detail_delta, summary_delta = await self.fetch_message()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt.py", line 849, in fetch_message
    res = await self._process_message(tag, fields)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_bolt5.py", line 369, in _process_message
    await response.on_failure(summary_metadata or {})
  File "/home/rouven/Neo4j/code/neo4j-python-driver/src/neo4j/_async/io/_common.py", line 245, in on_failure
    raise Neo4jError.hydrate(**metadata)
neo4j.exceptions.ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}

Process finished with exit code 1
The important bit: I do get
neo4j.exceptions.ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}

Could you please help me reproduce the issue where verify_connectivity() does not catch trying to connect to a database that doesn't exist? Or any other error you think it should catch.

robsdedude avatar Feb 08 '24 08:02 robsdedude

Here is the full traceback, not sure if it is helpful as that's the error from refresh_schema and not the connectivity method.

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:65, in Neo4jGraph.__init__(self, url, username, password, database)
     64 try:
---> 65     self.refresh_schema()
     66 except neo4j.exceptions.ClientError:

File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:88, in Neo4jGraph.refresh_schema(self)
     85 """
     86 Refreshes the Neo4j graph schema information.
     87 """
---> 88 node_properties = [el["output"] for el in self.query(node_properties_query)]
     89 rel_properties = [el["output"] for el in self.query(rel_properties_query)]

File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:79, in Neo4jGraph.query(self, query, params)
     78 try:
---> 79     data = session.run(query, params)
     80     return [r.data() for r in data]

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/session.py:302, in Session.run(self, query, parameters, **kwargs)
    301 if not self._connection:
--> 302     self._connect(self._config.default_access_mode)
    303     assert self._connection is not None

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/session.py:130, in Session._connect(self, access_mode, **acquire_kwargs)
    129 try:
--> 130     super()._connect(
    131         access_mode, auth=self._config.auth, **acquire_kwargs
    132     )
    133 except asyncio.CancelledError:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/work/workspace.py:178, in Workspace._connect(self, access_mode, auth, **acquire_kwargs)
    177 acquire_kwargs_.update(acquire_kwargs)
--> 178 self._connection = self._pool.acquire(**acquire_kwargs_)
    179 self._connection_access_mode = access_mode

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:912, in Neo4jPool.acquire(self, access_mode, timeout, database, bookmarks, auth, liveness_check_timeout)
    910 log.debug("[#0000]  _: <POOL> acquire routing connection, "
    911           "access_mode=%r, database=%r", access_mode, database)
--> 912 self.ensure_routing_table_is_fresh(
    913     access_mode=access_mode, database=database,
    914     imp_user=None, bookmarks=bookmarks, auth=auth,
    915     acquisition_timeout=timeout
    916 )
    918 while True:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:854, in Neo4jPool.ensure_routing_table_is_fresh(self, access_mode, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback)
    852     return False
--> 854 self.update_routing_table(
    855     database=database, imp_user=imp_user, bookmarks=bookmarks,
    856     auth=auth, acquisition_timeout=acquisition_timeout,
    857     database_callback=database_callback
    858 )
    859 self.update_connection_pool(database=database)

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:776, in Neo4jPool.update_routing_table(self, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback)
    774 if prefer_initial_routing_address:
    775     # TODO: Test this state
--> 776     if self._update_routing_table_from(
    777         self.address, database=database,
    778         imp_user=imp_user, bookmarks=bookmarks, auth=auth,
    779         acquisition_timeout=acquisition_timeout,
    780         database_callback=database_callback
    781     ):
    782         # Why is only the first initial routing address used?
    783         return

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:722, in Neo4jPool._update_routing_table_from(self, database, imp_user, bookmarks, auth, acquisition_timeout, database_callback, *routers)
    719 for address in NetworkUtil.resolve_address(
    720     router, resolver=self.pool_config.resolver
    721 ):
--> 722     new_routing_table = self.fetch_routing_table(
    723         address=address, acquisition_timeout=acquisition_timeout,
    724         database=database, imp_user=imp_user, bookmarks=bookmarks,
    725         auth=auth
    726     )
    727     if new_routing_table is not None:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:659, in Neo4jPool.fetch_routing_table(self, address, acquisition_timeout, database, imp_user, bookmarks, auth)
    658 try:
--> 659     new_routing_info = self.fetch_routing_info(
    660         address, database, imp_user, bookmarks, auth,
    661         acquisition_timeout
    662     )
    663 except Neo4jError as e:
    664     # checks if the code is an error that is caused by the client. In
    665     # this case there is no sense in trying to fetch a RT from another
    666     # router. Hence, the driver should fail fast during discovery.

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py:629, in Neo4jPool.fetch_routing_info(self, address, database, imp_user, bookmarks, auth, acquisition_timeout)
    628 try:
--> 629     routing_table = cx.route(
    630         database=database or self.workspace_config.database,
    631         imp_user=imp_user or self.workspace_config.impersonated_user,
    632         bookmarks=bookmarks
    633     )
    634 finally:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt4.py:534, in Bolt4x4.route(self, database, imp_user, bookmarks, dehydration_hooks, hydration_hooks)
    533 self.send_all()
--> 534 self.fetch_all()
    535 return [metadata.get("rt")]

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py:863, in Bolt.fetch_all(self)
    862 while not response.complete:
--> 863     detail_delta, summary_delta = self.fetch_message()
    864     detail_count += detail_delta

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py:849, in Bolt.fetch_message(self)
    846 tag, fields = self.inbox.pop(
    847     hydration_hooks=self.responses[0].hydration_hooks
    848 )
--> 849 res = self._process_message(tag, fields)
    850 self.idle_since = monotonic()

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_bolt4.py:368, in Bolt4x0._process_message(self, tag, fields)
    367 try:
--> 368     response.on_failure(summary_metadata or {})
    369 except (ServiceUnavailable, DatabaseUnavailable):

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/neo4j/_sync/io/_common.py:245, in Response.on_failure(self, metadata)
    244 Util.callback(handler)
--> 245 raise Neo4jError.hydrate(**metadata)

ClientError: {code: Neo.ClientError.Database.DatabaseNotFound} {message: Unable to get a routing table for database 'neo4j' because this database does not exist}

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[71], line 3
      1 from langchain.graphs import Neo4jGraph
----> 3 graph = Neo4jGraph(
      4     url=NEO4J_URI, 
      5     username=NEO4J_USERNAME, 
      6     password=NEO4J_PASSWORD
      7 )

File ~/.local/lib/python3.10/site-packages/langchain/graphs/neo4j_graph.py:67, in Neo4jGraph.__init__(self, url, username, password, database)
     65     self.refresh_schema()
     66 except neo4j.exceptions.ClientError:
---> 67     raise ValueError(
     68         "Could not use APOC procedures. "
     69         "Please ensure the APOC plugin is installed in Neo4j and that "
     70         "'apoc.meta.data()' is allowed in Neo4j configuration "
     71     )

ValueError: Could not use APOC procedures. Please ensure the APOC plugin is installed in Neo4j and that 'apoc.meta.data()' is allowed in Neo4j configuration

tomasonjo avatar Feb 12 '24 17:02 tomasonjo

This is quite interesting. It also seems you can reproduce the error. Could you please enable debug logging in the driver and post the full logs (including the passing verify_connectivity() call)? See the docs on how to enable it.

robsdedude avatar Feb 13 '24 07:02 robsdedude

@tomasonjo gentle reminder, can you please provide the driver debug logs to enable me to further investigate the issue? Or better yet, help we reproduce the issue. What exactly did you do to trigger it?

robsdedude avatar Feb 29 '24 09:02 robsdedude

Hey @robsdedude . I couldn't reproduce it locally, I just got a full debug from the user who reported the issue. Could it be something with firewall permissions, since they were at a customer with strict security?

tomasonjo avatar Feb 29 '24 09:02 tomasonjo

The only think I could imagine is something along the lines of the user running multiple DBMSs and having a single DNS entry point at both (or a disjoint/misconfigured cluster or something like it) so that it's up to chance whether the driver succeeds or fails. I definitely need more information to be able to make a call if this is really a driver bug or if this is expected behavior and the problem lies somewhere else.

robsdedude avatar Mar 04 '24 17:03 robsdedude

As written, I'm afraid I won't be able to dig further into the issue without extra info. Therefore, I'll close the issue. We can always reopen later if new information is available.

robsdedude avatar Mar 25 '24 11:03 robsdedude