nim-anywhere icon indicating copy to clipboard operation
nim-anywhere copied to clipboard

Milvus failed to connect to the default network defined in compose-rendered.yaml

Open liatamax opened this issue 9 months ago • 3 comments

Hello,

I have a NVWB local installation on a Ubuntu 22.04 Desktop machine with 1x L40S GPU and the latest CUDA 12.8 and GPU driver 570.124.06

I cloned the latest nim-anywhere on 3/4 and can start all applications successfully.

However, after a server reboot, the Environment failed to start, even after a clean rebuild.

Below is the error from the Output/Compose window.

Container nim-anywhere-nvwb-init-service-1 Created Container nim-anywhere-redis-1 Created Container nim-anywhere-milvus-1 Created Attaching to milvus-1, nvwb-init-service-1, redis-1

nvwb-init-service-1 exited with code 0 Gracefully stopping... (press Ctrl+C again to force) Error response from daemon: network 26e2d4ff4f36db70b6f2ac72991959c28a8b4df190d8e6dd7a345f0100a04c79 not found

From Output/Chain Server window:

INFO: Will watch for changes in these directories: ['/project/code'] INFO: Uvicorn running on http://0.0.0.0:3030 (Press CTRL+C to quit) INFO: Started reloader process [275] using WatchFiles 2025-03-05 19:37:02,777 [ERROR][_create_connection]: Failed to create new connection using: cc45c2256b0e4b4dad5f6310ea8864e8 (milvus_client.py:918) Process SpawnProcess-1: return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/project/code/chain_server/server.py", line 27, in from .chain import my_chain # type: ignore File "/project/code/chain_server/chain.py", line 41, in vector_store = Milvus( File "/home/workbench/.local/lib/python3.10/site-packages/langchain_milvus/vectorstores/milvus.py", line 384, in init self._milvus_client = MilvusClient( File "/home/workbench/.local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 66, in init self._using = self._create_connection( File "/home/workbench/.local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 919, in _create_connection raise ex from ex File "/home/workbench/.local/lib/python3.10/site-packages/pymilvus/milvus_client/milvus_client.py", line 916, in _create_connection connections.connect(using, user, password, db_name, token, uri=uri, **kwargs) File "/home/workbench/.local/lib/python3.10/site-packages/pymilvus/orm/connections.py", line 461, in connect connect_milvus(**kwargs, user=user, password=password, token=token, db_name=db_name) File "/home/workbench/.local/lib/python3.10/site-packages/pymilvus/orm/connections.py", line 411, in connect_milvus gh._wait_for_channel_ready(timeout=timeout) File "/home/workbench/.local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 152, in _wait_for_channel_ready raise MilvusException( pymilvus.exceptions.MilvusException: <MilvusException: (code=2, message=Fail connecting to server on milvus:19530, illegal connection params or server unavailable)>

The error can be replicated using the upload-pdfs jupyter notebook.

from langchain_milvus.vectorstores.milvus import Milvus print(config.milvus); #url='http://milvus:19530' collection_name='collection_1'

vector_store = Milvus( embedding_function=embedding_model, connection_args={"uri": config.milvus.url}, collection_name=config.milvus.collection_name, auto_id=True, )

liatamax avatar Mar 05 '25 20:03 liatamax

Just to verify that I can launch the Milvus independently ie outside NVWB client.

user@user-as-4125gs-tnrt:~/nvidia-workbench/NVIDIA-nim-anywhere$docker compose up milvus

user@user-as-4125gs-tnrt:~/nvidia-workbench/NVIDIA-nim-anywhere$docker compose ps

NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS nvidia-nim-anywhere-milvus-1 milvusdb/milvus:v2.4.6 "/tini -- milvus run…" milvus 9 minutes ago Up 2 minutes (healthy)

user@user-as-4125gs-tnrt:~/nvidia-workbench/NVIDIA-nim-anywhere$ docker compose exec milvus curl http://localhost:9091/healthz OK(base)

liatamax avatar Mar 05 '25 22:03 liatamax

Hi there,

I'm happy to help get this issue resolved.

I'm unable to replicate this issue. It seems like the compose is shutting down based on

Gracefully stopping... (press Ctrl+C again to force)
Error response from daemon: network 26e2d4ff4f36db70b6f2ac72991959c28a8b4df190d8e6dd7a345f0100a04c79 not found

Are you using one of the docker compose profiles? If so, which one? Can you verify the exact steps you took, starting with cloning the project, to produce this issue?

MattFeinberg avatar Mar 06 '25 02:03 MattFeinberg

Hello - I have figured out the root cause.

  1. To replicate, I navigate the the project folder and use docker compose to bring up the milvus service defined in the compose-rendered.yaml file.

$ docker compose -f compose-rendered.yaml up milvus [+] Running 2/0 ✔ Container nim-anywhere-nvwb-init-service-1 Created 0.0s ✔ Container nim-anywhere-milvus-1 Created 0.0s Attaching to milvus-1 Gracefully stopping... (press Ctrl+C again to force) Error response from daemon: network 9c84bd78ebddea06d96faabc6a47a2070f85b8c80786ff984d29640ce86f7fb0 not found

There is no docker network with the id 9c84bd78eb.

$ docker network ls NETWORK ID NAME DRIVER SCOPE d32aa2191925 bridge bridge local 495d39fe01a3 host host local 0f8e78f89479 nim-anywhere_default bridge local 7402aefc60a2 none null local 284f7f3cba4a nvidia-nim-anywhere_default bridge local 766e591ebad5 workbench bridge local

  1. Upon inspecting the compose-rendered.yaml file, the network section under "milvus" service has the following definition:

    networks: default: null workbench: {}

I have attached my compose-rendered.yaml file.

  1. I can change the networks setting to use workbench explicitly.

    networks:

    • workbench

Then "docker compose -f compose-rendered.yaml up milvus" works fine.

What is interesting is that the missing network with id 9c84bd78ebddea06d96faabc6a47a2070f85b8c80786ff984d29640ce86f7fb0 is from the previous run of NVWB app. So seems the NVWB is somehow using an outdated "workbench" network. Or, the "workbench" somehow got regenerated when NVWB app is running.

$ docker compose -f compose-rendered.yaml up milvus [+] Running 2/0 ✔ Container nim-anywhere-nvwb-init-service-1 Created 0.0s ✔ Container nim-anywhere-milvus-1 Created 0.0s Attaching to milvus-1 Gracefully stopping... (press Ctrl+C again to force) Error response from daemon: network 26e2d4ff4f36db70b6f2ac72991959c28a8b4df190d8e6dd7a345f0100a04c79 not found (nvwb:local/nim-anywhere) (base) user@user-as-4125gs-tnrt:~/.nvwb/project-runtime-info/NVIDIA-nim-anywhere-f65c90dd741f9068b1fcf6ef1e89eed95a31af39$ less compose-rendered.yaml (nvwb:local/nim-anywhere) (base) user@user-as-4125gs-tnrt:~/.nvwb/project-runtime-info/NVIDIA-nim-anywhere-f65c90dd741f9068b1fcf6ef1e89eed95a31af39$ docker network ls NETWORK ID NAME DRIVER SCOPE ce873194bdeb bridge bridge local 495d39fe01a3 host host local 0f8e78f89479 nim-anywhere_default bridge local 7402aefc60a2 none null local 284f7f3cba4a nvidia-nim-anywhere_default bridge local 9c84bd78ebdd workbench bridge local

My fix now is to use the "Edit compose file" feature of NVWB app to change such network setting for "milvus".

compose-rendered.yaml.txt

liatamax avatar Mar 06 '25 22:03 liatamax