nv-ingest icon indicating copy to clipboard operation
nv-ingest copied to clipboard

[BUG]: nv-ingest-nv-ingest-ms-runtime-1 container failed to connect yolox endpoints

Open angudadev opened this issue 1 year ago • 12 comments

Version

24.08

Which installation method(s) does this occur on?

Docker

Describe the bug.

nv-ingest-nv-ingest-ms-runtime-1 container failed to run with error as failed to connect yolox endpoints

Docker Containers Status

ubuntu@vm:~/nv-ingest$ docker ps -a
CONTAINER ID   IMAGE                                                                      COMMAND                  CREATED              STATUS                    PORTS                                                                                                                                                                                                                                                                                    NAMES
df6d0820de49   nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.08                       "/opt/conda/bin/tini…"   11 seconds ago       Exited (1) 1 second ago                                                                                                                                                                                                                                                                                            nv-ingest-nv-ingest-ms-runtime-1
bbba107092e1   otel/opentelemetry-collector-contrib:0.91.0                                "/otelcol-contrib --…"   About a minute ago   Up 44 seconds             0.0.0.0:4317-4318->4317-4318/tcp, :::4317-4318->4317-4318/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, :::8888-8889->8888-8889/tcp, 0.0.0.0:13133->13133/tcp, :::13133->13133/tcp, 55678/tcp, 0.0.0.0:32769->9411/tcp, [::]:32769->9411/tcp, 0.0.0.0:55680->55679/tcp, [::]:55680->55679/tcp   nv-ingest-otel-collector-1
f75619d3f126   nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.0.1                                  "/opt/nvidia/nvidia_…"   About a minute ago   Up 45 seconds             0.0.0.0:8012->8000/tcp, [::]:8012->8000/tcp, 0.0.0.0:8013->8001/tcp, [::]:8013->8001/tcp, 0.0.0.0:8014->8002/tcp, [::]:8014->8002/tcp                                                                                                                                                    nv-ingest-embedding-1
9418a2ab08e7   nvcr.io/ohlfw0olaadg/ea-participants/cached:0.1.0                          "/opt/nvidia/nvidia_…"   About a minute ago   Up 45 seconds             0.0.0.0:8006->8000/tcp, [::]:8006->8000/tcp, 0.0.0.0:8007->8001/tcp, [::]:8007->8001/tcp, 0.0.0.0:8008->8002/tcp, [::]:8008->8002/tcp                                                                                                                                                    nv-ingest-cached-1
40a89864219e   nvcr.io/ohlfw0olaadg/ea-participants/paddleocr:0.1.0                       "/opt/nvidia/nvidia_…"   About a minute ago   Up 45 seconds             0.0.0.0:8009->8000/tcp, [::]:8009->8000/tcp, 0.0.0.0:8010->8001/tcp, [::]:8010->8001/tcp, 0.0.0.0:8011->8002/tcp, [::]:8011->8002/tcp                                                                                                                                                    nv-ingest-paddle-1
3d98a8921104   nvcr.io/ohlfw0olaadg/ea-participants/nv-yolox-structured-images-v1:0.1.0   "/opt/nvidia/nvidia_…"   About a minute ago   Up 45 seconds             0.0.0.0:8000-8002->8000-8002/tcp, :::8000-8002->8000-8002/tcp                                                                                                                                                                                                                            nv-ingest-yolox-1
31012ffe11e4   nvcr.io/ohlfw0olaadg/ea-participants/deplot:1.0.0                          "/opt/nvidia/nvidia_…"   About a minute ago   Up 45 seconds             0.0.0.0:8003->8000/tcp, [::]:8003->8000/tcp, 0.0.0.0:8004->8001/tcp, [::]:8004->8001/tcp, 0.0.0.0:8005->8002/tcp, [::]:8005->8002/tcp                                                                                                                                                    nv-ingest-deplot-1
d27e876c2ec1   openzipkin/zipkin                                                          "start-zipkin"           About a minute ago   Up 46 seconds (healthy)   9410/tcp, 0.0.0.0:9411->9411/tcp, :::9411->9411/tcp                                                                                                                                                                                                                                      nv-ingest-zipkin-1
b0e7456e16ff   redis/redis-stack                                                          "/entrypoint.sh"         About a minute ago   Up 45 seconds             0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 8001/tcp                                                                                                                                                                                                                                      nv-ingest-redis-1
6764b4f11b6a   grafana/grafana                                                            "/run.sh"                About a minute ago   Up 46 seconds             0.0.0.0:3000->3000/tcp, :::3000->3000/tcp                                                                                                                                                                                                                                                grafana-service
bd969769645a   prom/prometheus:latest                                                     "/bin/prometheus --w…"   About a minute ago   Up 45 seconds             0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                                                                                                                                                                                                                nv-ingest-prometheus-1

Minimum reproducible example

Followed the QuickStart instructions

Relevant log output

Docker Container Logs

ubuntu@vm:~/nv-ingest$ docker logs nv-ingest-nv-ingest-ms-runtime-1
/opt/conda/envs/morpheus/lib/python3.10/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Configuration loaded and validated: image_caption_extraction_module={} image_storage_module={} metadata_injection_module={} otel_meter_module={} pdf_extractor_module={} redis_task_sink={} redis_task_source={} text_splitting_module={}
INFO:__main__:Starting pipeline setup
INFO:__main__:MESSAGE_CLIENT_HOST: redis
INFO:__main__:MESSAGE_CLIENT_PORT: 6379
INFO:__main__:TABLE_DETECTION_GRPC_TRITON:
INFO:__main__:TABLE_DETECTION_HTTP_TRITON:
INFO:__main__:PADDLE_GRPC_ENDPOINT: paddle:8001
INFO:__main__:PADDLE_HTTP_ENDPOINT: http://paddle:8000/v1/infer
INFO:__main__:DEPLOT_GRPC_ENDPOINT: ""
INFO:__main__:DEPLOT_HTTP_ENDPOINT: http://deplot:8000/v1/chat/completions
INFO:__main__:CACHED_GRPC_ENDPOINT: cached:8001
INFO:__main__:CACHED_HTTP_ENDPOINT: http://cached:8000/v1/infer
Traceback (most recent call last):
  File "/workspace/pipeline.py", line 710, in <module>
    cli()
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/pipeline.py", line 706, in cli
    pipeline(morpheus_pipeline_config, final_ingest_config)
  File "/workspace/pipeline.py", line 584, in pipeline
    setup_ingestion_pipeline(pipe, morpheus_pipeline_config, ingest_config)
  File "/workspace/pipeline.py", line 532, in setup_ingestion_pipeline
    pdf_extractor_stage = add_pdf_extractor_stage(pipe, morpheus_pipeline_config, ingest_config, default_cpu_count)
  File "/workspace/pipeline.py", line 251, in add_pdf_extractor_stage
    generate_pdf_extractor_stage(
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/nv_ingest/stages/pdf_extractor_stage.py", line 135, in generate_pdf_extractor_stage
    validated_config = PDFExtractorSchema(**extractor_config)
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PDFExtractorSchema
pdfium_config -> __root__
  Both gRPC and HTTP services cannot be empty for yolox_endpoints. (type=value_error)
ubuntu@vm:~/nv-ingest$ docker logs nv-ingest-yolox-1 | tail -10f
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
{"timestamp": "2024-10-15 19:04:20,486", "level": "ERROR", "message": "I1015 19:04:20.483885 123 infer_handler.h:1322] "Thread started for ModelInferHandler""}
{"timestamp": "2024-10-15 19:04:20,486", "level": "ERROR", "message": "I1015 19:04:20.484005 123 infer_handler.cc:680] "New request handler for ModelInferHandler, 0""}
{"timestamp": "2024-10-15 19:04:20,486", "level": "ERROR", "message": "I1015 19:04:20.484029 123 infer_handler.h:1322] "Thread started for ModelInferHandler""}
{"timestamp": "2024-10-15 19:04:20,486", "level": "ERROR", "message": "I1015 19:04:20.484210 123 stream_infer_handler.cc:128] "New request handler for ModelStreamInferHandler, 0""}
{"timestamp": "2024-10-15 19:04:20,486", "level": "ERROR", "message": "I1015 19:04:20.484275 123 infer_handler.h:1322] "Thread started for ModelStreamInferHandler""}
{"timestamp": "2024-10-15 19:04:20,486", "level": "ERROR", "message": "I1015 19:04:20.484283 123 grpc_server.cc:2463] "Started GRPCInferenceService at 0.0.0.0:8001""}
{"timestamp": "2024-10-15 19:04:20,486", "level": "ERROR", "message": "I1015 19:04:20.484439 123 http_server.cc:4692] "Started HTTPService at 0.0.0.0:8080""}
{"timestamp": "2024-10-15 19:04:20,528", "level": "ERROR", "message": "I1015 19:04:20.525731 123 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002""}
{"timestamp": "2024-10-15 19:04:21,659", "level": "INFO", "message": "Starting NIM inference server"}
{"timestamp": "2024-10-15 19:04:21,674", "level": "INFO", "message": "Serving HTTP server on port 8000"}

Other/Misc.

No response

angudadev avatar Oct 15 '24 19:10 angudadev

This regression started in https://github.com/NVIDIA/nv-ingest/pull/154.

ishandhanani avatar Oct 15 '24 22:10 ishandhanani

@angudadevops @ishandhanani Can you tell us how your docker-compose.yaml looks like? Specifically, these lines: https://github.com/NVIDIA/nv-ingest/blob/18a0b70e50f3a159dcd089618269c880546e4f6c/docker-compose.yaml#L157-L159

edknv avatar Oct 15 '24 22:10 edknv

I didn’t modify docker-compose.yaml, it looks exactly what’s in the yaml now.

Do we need to change that value ?

angudadev avatar Oct 15 '24 22:10 angudadev

I also have not changed anything. When I'm on main

- YOLOX_GRPC_ENDPOINT=yolox:8001
- YOLOX_HTTP_ENDPOINT=http://yolox:8000/v1/infer
- YOLOX_INFER_PROTOCOL=grpc

When I'm on the last working branch c707a2b2bec26fcdde32a440820ecea62487f482

- YOLOX_GRPC_ENDPOINT=yolox:8001
- YOLOX_HEALTH_ENDPOINT=yolox:8000
- YOLOX_HTTP_ENDPOINT=""

ishandhanani avatar Oct 15 '24 22:10 ishandhanani

Sorry about that. Can you try one of these two workarounds:

  1. As stated in README, try docker compose build nv-ingest-ms-runtime.
nv-ingest is in Early Access mode, meaning the codebase gets frequent updates. To build an updated nv-ingest service container with the latest changes you can:

docker compose build
After the image is built, run docker compose up per item 5 above.
  1. Manually add
      - TABLE_DETECTION_GRPC_TRITON=yolox:8001  # Kept for backward comptability. Only used in legacy EA container.

to docker-compose-yaml under nv-ingest-ms-runtime.

edknv avatar Oct 15 '24 23:10 edknv

@edknv when I tried with above suggestions seeing that nv ingest container status is unhealthy with below logs, looks like it's failed to connect to milvus though we didn't enable milvus in docker-compose.yaml

docker logs -f nv-ingest-nv-ingest-ms-runtime-1
INFO:     Uvicorn running on http://0.0.0.0:7670 (Press CTRL+C to quit)
INFO:     Started parent process [20]
INFO:     Started server process [26]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [48]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [22]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [52]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [31]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [43]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [49]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [53]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [45]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [41]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [51]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [34]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [30]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [50]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [27]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [33]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [32]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [46]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [37]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [35]
INFO:     Started server process [25]
INFO:     Waiting for application startup.
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Application startup complete.
INFO:     Started server process [40]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [47]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [23]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [42]
INFO:     Waiting for application startup.
INFO:     Started server process [24]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Application startup complete.
INFO:     Started server process [44]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [39]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [28]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [36]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [38]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Started server process [29]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
/opt/conda/envs/nv_ingest/lib/python3.10/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Configuration loaded and validated: image_caption_extraction_module={} image_storage_module={} metadata_injection_module={} otel_meter_module={} pdf_extractor_module={} redis_task_sink={} redis_task_source={} text_splitting_module={}
2024-10-16 15:29:29,056 - INFO - Starting pipeline setup
2024-10-16 15:29:29,056 - INFO - MESSAGE_CLIENT_HOST: redis
2024-10-16 15:29:29,056 - INFO - MESSAGE_CLIENT_PORT: 6379
2024-10-16 15:29:29,057 - INFO - YOLOX_GRPC_TRITON: yolox:8001
2024-10-16 15:29:29,057 - INFO - YOLOX_HTTP_TRITON: http://yolox:8000/v1/infer
2024-10-16 15:29:29,057 - INFO - YOLOX_INFER_PROTOCOL: grpc
2024-10-16 15:29:29,057 - INFO - PADDLE_GRPC_TRITON: paddle:8001
2024-10-16 15:29:29,057 - INFO - PADDLE_HTTP_TRITON: http://paddle:8000/v1/infer
2024-10-16 15:29:29,057 - INFO - PADDLE_INFER_PROTOCOL: grpc
2024-10-16 15:29:29,057 - INFO - DEPLOT_GRPC_TRITON: ""
2024-10-16 15:29:29,057 - INFO - DEPLOT_HTTP_TRITON: http://deplot:8000/v1/chat/completions
2024-10-16 15:29:29,057 - INFO - DEPLOT_INFER_PROTOCOL: http
2024-10-16 15:29:29,057 - INFO - CACHED_GRPC_TRITON: cached:8001
2024-10-16 15:29:29,057 - INFO - CACHED_HTTP_TRITON: http://cached:8000/v1/infer
2024-10-16 15:29:29,057 - INFO - CACHED_INFER_PROTOCOL: grpc
2024-10-16 15:29:30,467 - INFO - Pipeline setup completed in 1.41 seconds
2024-10-16 15:29:30,467 - INFO - Running pipeline
W20241016 15:29:30.577120 135446224779072 topology.cpp:96] skipping device: NVIDIA L40S with pcie: 00000000:9E:00.0; errmsg=invalid device ordinal
W20241016 15:29:30.577930 135446224779072 topology.cpp:96] skipping device: NVIDIA L40S with pcie: 00000000:A2:00.0; errmsg=invalid device ordinal
W20241016 15:29:30.578065 135446224779072 topology.cpp:96] skipping device: NVIDIA L40S with pcie: 00000000:A4:00.0; errmsg=invalid device ordinal
W20241016 15:29:30.578178 135446224779072 topology.cpp:96] skipping device: NVIDIA L40S with pcie: 00000000:C6:00.0; errmsg=invalid device ordinal
W20241016 15:29:30.578286 135446224779072 topology.cpp:96] skipping device: NVIDIA L40S with pcie: 00000000:C8:00.0; errmsg=invalid device ordinal
W20241016 15:29:30.578390 135446224779072 topology.cpp:96] skipping device: NVIDIA L40S with pcie: 00000000:CA:00.0; errmsg=invalid device ordinal
W20241016 15:29:30.578498 135446224779072 topology.cpp:96] skipping device: NVIDIA L40S with pcie: 00000000:CC:00.0; errmsg=invalid device ordinal
====Pipeline Pre-build====
====Pre-Building Segment: main====
====Pre-Building Segment Complete!====
====Pipeline Pre-build Complete!====
====Registering Pipeline====
====Building Pipeline====
====Building Pipeline Complete!====
====Registering Pipeline Complete!====
====Starting Pipeline====
====Pipeline Started====
====Building Segment: main====
Added source: <redis_listener-0; LinearModuleSourceStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a957c0790>, output_port_name=output, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─> morpheus.ControlMessage
Added stage: <submitted_job_counter-1; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a957c0910>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <metadata_injection-2; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a957c0a60>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <extract499186039a1e4d49a83607e8aa535b27-3; MultiProcessingBaseStage(task=extract, task_desc=pdf_content_extractor, pe_count=8, process_fn=functools.partial(<function process_pdf_bytes at 0x7b2b64bbf7f0>, validated_config=PDFExtractorSchema(max_queue_size=1, n_workers=16, raise_on_failure=False, pdfium_config=PDFiumConfigSchema(auth_token='bjloZzM1OHNkbXFnOGllM2JiZ2oybjZtZHM6YjcxN2I0MzEtMzc2NC00NzUzLWJkY2MtYWU1NzI4ZWY3NWZm', cached_endpoints=('cached:8001', 'http://cached:8000/v1/infer'), deplot_endpoints=(None, 'http://deplot:8000/v1/chat/completions'), paddle_endpoints=('paddle:8001', 'http://paddle:8000/v1/infer'), yolox_endpoints=('yolox:8001', 'http://yolox:8000/v1/infer'), cached_infer_protocol='grpc', deplot_infer_protocol='http', paddle_infer_protocol='grpc', yolox_infer_protocol='grpc', identify_nearby_objects=False))), document_type=pdf, filter_properties=None)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <extracta574366036fb4742b4d42ceb1e0b9000-4; MultiProcessingBaseStage(task=extract, task_desc=docx_content_extractor, pe_count=1, process_fn=<function _process_docx_bytes at 0x7b2b6522e830>, document_type=docx, filter_properties=None)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <extractbfa6f242bbb74db8912b06059bc13dc5-5; MultiProcessingBaseStage(task=extract, task_desc=pptx_content_extractor, pe_count=1, process_fn=<function _process_pptx_bytes at 0x7b2a967e0a60>, document_type=pptx, filter_properties=None)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <dedupf2f9f7b72c8a495c92fdccc6b6c4b664-6; MultiProcessingBaseStage(task=dedup, task_desc=dedup_images, pe_count=2, process_fn=functools.partial(<function dedup_image_stage at 0x7b2b64bbe440>, validated_config=ImageDedupSchema(raise_on_failure=False, cpu_only=False)), document_type=None, filter_properties={'content_type': 'image'})>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <filter3e87c85a70c94405bd2298c2701cd763-7; MultiProcessingBaseStage(task=filter, task_desc=filter_images, pe_count=2, process_fn=functools.partial(<function image_filter_stage at 0x7b2b64bbf520>, validated_config=ImageFilterSchema(raise_on_failure=False, cpu_only=False)), document_type=None, filter_properties={'content_type': 'image'})>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <nemo_doc_splitter-8; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a9445faf0>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <embed_extractions-9; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a9445fca0>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <image-storage-10; ImageStorageStage(module_config=None, raise_on_failure=False)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
2024-10-16 15:29:40,804 - ERROR - Failed to create new connection using: 7fc9d34ed8214ce9940f8817237f4e6d
2024-10-16 15:29:40,805 - ERROR - Failed to connect to milvus: <MilvusException: (code=2, message=Fail connecting to server on milvus:19530. Timeout)>
Added stage: <vdb_task_sink-12; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a9444c160>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <redis_task_sink-11; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a9444c040>, input_port_name=input, output_port_name=output, input_type=typing.Any, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <otel_meter-14; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a9444c460>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <otel_tracer-13; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a9444c2e0>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
Added stage: <completed_job_counter-15; LinearModulesStage(module_config=<morpheus.utils.module_utils.ModuleLoader object at 0x7b2a9444c5e0>, input_port_name=input, output_port_name=output, input_type=<class 'morpheus._lib.messages.ControlMessage'>, output_type=<class 'morpheus._lib.messages.ControlMessage'>)>
  └─ morpheus.ControlMessage -> morpheus.ControlMessage
====Building Segment Complete!====
docker ps -a | grep nv-ingest-nv-ingest-ms-runtime-1
b1e38ba10d05   nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.08                       "/opt/conda/envs/nv_…"   4 minutes ago    Up 4 minutes (unhealthy)   0.0.0.0:7670->7670/tcp, :::7670->7670/tcp                                                                                                                                                                                                                                                nv-ingest-nv-ingest-ms-runtime-1

angudadev avatar Oct 16 '24 15:10 angudadev

@angudadevops - check out https://github.com/ishandhanani/nv-ingest. This is a working example. Steps to fix

  1. Reverted main back to c707a2b2bec26fcdde32a440820ecea62487f482
  2. For some reason - Uvicorn did not show up in 24.08. Not sure if this is a widespread bug. I fixed this but added that installation into the Dockerfile
  3. I removed the image from the docker-compose and instead have it build from scratch
  4. I enabled the Milvus container.

Here's a working e2e launchable for it - https://console.brev.dev/launchable/deploy?launchableID=env-2nTyhfMGFMPlRGqL9Ehp71TVI83

ishandhanani avatar Oct 16 '24 16:10 ishandhanani

@angudadevops Please ignore ERROR - Failed to connect to milvus as it is not an error. But it is indeed confusing and we will improve this message. We are also investigating why docker ps shows the service is unhealthy even when the container starts successfully, but when the log shows ====Building Segment Complete!====, that means nv-ingest-ms-runtime is up and running. For readiness check, please use curl http://localhost:7670/v1/health/ready for now.

edknv avatar Oct 16 '24 20:10 edknv

@edknv

agree but I'm unable to connect to nvingest service

docker ps
CONTAINER ID   IMAGE                                                                      COMMAND                  CREATED       STATUS                 PORTS                                                                                                                                                                                                                                                                                    NAMES
ff39f4a12a19   nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.08                       "/opt/conda/bin/tini…"   5 hours ago   Up 5 hours             0.0.0.0:7670->7670/tcp, :::7670->7670/tcp                                                                                                                                                                                                                                                nv-ingest-nv-ingest-ms-runtime-1

ubuntu@ip-172-31-25-27:~$ curl http://localhost:7670/v1/health/ready
curl: (56) Recv failure: Connection reset by peer

angudadev avatar Oct 16 '24 22:10 angudadev

@angudadevops - can you try to exec into the container? Check if uvicorn is installed

ishandhanani avatar Oct 16 '24 22:10 ishandhanani

yes @ishandhanani

it's installed

ubuntu@vm1:~/nv-ingest$ docker run -it --rm 496b46d92947 bash
(nv_ingest) root@5efee92b8cc3:/workspace# pip list | grep uvicorn
uvicorn                                  0.24.0.post1

angudadev avatar Oct 17 '24 20:10 angudadev

This has been solved in the newest release. Running

docker compose build
docker compose up

solves this @angudadevops.

ishandhanani avatar Oct 30 '24 18:10 ishandhanani