[ChatQnA] TEI 1.5 causes an error starting the tei-reranking-server
I was updating my Terraform/Ansible recipe this morning and I ran into this error while trying to start the example using the provided compose.yaml file.
Error: Could not create backend
Caused by:
Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor onnx::MatMul_3121 failed.GetFileLength for /data/models--BAAI--bge-reranker-large/snapshots/55611d7bca2a7133960a6d3b71e083071bbfc312/onnx/model.onnx_data failed:Invalid fd was supplied: -1
The reranking container then exits.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff5e425f0098 opea/chatqna-ui:latest "docker-entrypoint.s…" 11 minutes ago Up 11 minutes 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
db9e4993a00c opea/chatqna:latest "python chatqna.py" 11 minutes ago Up 11 minutes 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
57845af811cd opea/reranking-tei:latest "python reranking_te…" 11 minutes ago Up 11 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
82d83238b4fe opea/llm-tgi:latest "bash entrypoint.sh" 11 minutes ago Up 11 minutes 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
7347b2fb4a3b opea/embedding-tei:latest "python embedding_te…" 11 minutes ago Up 11 minutes 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
97ef7cfeb315 opea/dataprep-redis:latest "python prepare_doc_…" 11 minutes ago Up 11 minutes 0.0.0.0:6007-6009->6007-6009/tcp, :::6007-6009->6007-6009/tcp dataprep-redis-server
12c84f7e9667 opea/retriever-redis:latest "/home/user/comps/re…" 11 minutes ago Up 11 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
c573346d18c1 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 11 minutes ago Up 11 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
c7d1c7f0763d ghcr.io/huggingface/text-generation-inference:2.1.0 "/tgi-entrypoint.sh …" 11 minutes ago Up 11 minutes 0.0.0.0:9009->80/tcp, :::9009->80/tcp tgi-service
3ac952569089 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 11 minutes ago Exited (1) 10 minutes ago tei-reranking-server
47f803c58222 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 11 minutes ago Up 11 minutes 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server
I found the following issue on the TEI repo: https://github.com/huggingface/text-embeddings-inference/issues/341
If I switch the reranking service to use 1.4, then it starts correctly and the example works.
root@ip-172-31-29-103:/opt/GenAIExamples/ChatQnA/docker/xeon# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
03154665e268 opea/chatqna-ui:latest "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
319d28eeb771 opea/chatqna:latest "python chatqna.py" 2 minutes ago Up 2 minutes 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
3a0ca8da9838 opea/llm-tgi:latest "bash entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
3a78c4595faa opea/reranking-tei:latest "python reranking_te…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
54dd50950c0d opea/dataprep-redis:latest "python prepare_doc_…" 2 minutes ago Up 2 minutes 0.0.0.0:6007-6009->6007-6009/tcp, :::6007-6009->6007-6009/tcp dataprep-redis-server
ec9fc6be5ddd opea/retriever-redis:latest "/home/user/comps/re…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
3432e3e8b6a4 opea/embedding-tei:latest "python embedding_te…" 2 minutes ago Up 2 minutes 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
d154ee44dce7 ghcr.io/huggingface/text-generation-inference:2.1.0 "/tgi-entrypoint.sh …" 2 minutes ago Up 2 minutes 0.0.0.0:9009->80/tcp, :::9009->80/tcp tgi-service
632f404a5be9 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
d11a7d3770e7 ghcr.io/huggingface/text-embeddings-inference:cpu-1.4 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-server
ab513f0a9810 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server
@kevinintel and @lvliang-intel any chance we can revert to TGI 1.4 till the Hugging face bug is resolved. This example otherwise fails.
Thanks @mkbhanda
I'm testing with 1.5 today and I'm not seeing this issue:
$ docker compose -f compose.yaml logs reranking -f
...
reranking-tei-xeon-server | [2024-07-31 23:49:43,238] [ INFO] - CORS is enabled.
reranking-tei-xeon-server | [2024-07-31 23:49:43,239] [ INFO] - Setting up HTTP server
reranking-tei-xeon-server | [2024-07-31 23:49:43,239] [ INFO] - Uvicorn server setup on port 8000
reranking-tei-xeon-server | INFO: Waiting for application startup.
reranking-tei-xeon-server | INFO: Application startup complete.
reranking-tei-xeon-server | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
reranking-tei-xeon-server | [2024-07-31 23:49:43,255] [ INFO] - HTTP server setup successful
and:
$ docker compose -f compose.yaml ps | grep reranking
reranking-tei-xeon-server opea/reranking-tei:latest "python reranking_te…" reranking 10 minutes ago Up 9 minutes 0.0.0.0:8000->8000/tcp
tei-reranking-server ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" tei-reranking-service 10 minutes ago Up 9 minutes 0.0.0.0:8808->80/tcp
Would you please try again?
Thanks.
Is it possible that v0.7 containers are being mixed with latest code?
I'll test again, but the only mixing of code happening is that I'm using the latest compose.yaml file with the v0.7 containers and I don't see major differences for the reranking service, if I look at the history I don't see any major changes beyond the TEI version since v0.7.
I retested this morning and I'm getting the same error:
2024-08-01T13:00:54.638796Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-**rge", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "e6c5ab9b2197", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-08-01T13:00:54.638872Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-08-01T13:00:54.681575Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-01T13:00:54.863118Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-01T13:00:54.878546Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-01T13:00:54.878558Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-01T13:00:54.960966Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-01T13:00:55.108601Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-08-01T13:00:55.126338Z WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-large/resolve/main/model.onnx)
2024-08-01T13:00:55.126364Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
2024-08-01T13:00:55.166679Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 288.133574ms
2024-08-01T13:00:55.783229Z WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
2024-08-01T13:00:55.783257Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-08-01T13:00:55.784582Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 32 tokenization workers
2024-08-01T13:01:02.836291Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend
Caused by:
Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor roberta.encoder.layer.13.intermediate.dense.bias failed.GetFileLength for /data/models--BAAI--bge-reranker-large/snapshots/55611d7bca2a7133960a6d3b71e083071bbfc312/onnx/model.onnx_data failed:Invalid fd was supplied: -1
Just tested with the v0.7 release of the repo and it takes the version of TEI back to 1.2, so it seems like things are working. Let me see if I build the v0.8 containers if things work. Also, with the v0.7 release the conversational UI is deployed but not working.
Just tested with the v0.8 release of the repo, building the containers and I'm running into the original problem with the TEI 1.5 container for reranking.
root@ip-172-31-19-70:/opt/GenAIExamples/ChatQnA/docker/xeon# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e233647b23ef opea/chatqna-ui:latest "docker-entrypoint.s…" 36 seconds ago Up 34 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
e253767c1551 opea/chatqna:latest "python chatqna.py" 36 seconds ago Up 35 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
ef38c496621f opea/embedding-tei:latest "python embedding_te…" 36 seconds ago Up 35 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
6b2488c0716c opea/dataprep-redis:latest "python prepare_doc_…" 36 seconds ago Up 35 seconds 0.0.0.0:6007-6009->6007-6009/tcp, :::6007-6009->6007-6009/tcp dataprep-redis-server
d51236335642 opea/retriever-redis:latest "python retriever_re…" 36 seconds ago Up 35 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
335c1b57fde2 opea/llm-tgi:latest "bash entrypoint.sh" 36 seconds ago Up 35 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server
257f047b5fea opea/reranking-tei:latest "python reranking_te…" 36 seconds ago Up 35 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
6ad87af9aab0 ghcr.io/huggingface/text-generation-inference:2.1.0 "/tgi-entrypoint.sh …" 36 seconds ago Up 36 seconds 0.0.0.0:9009->80/tcp, :::9009->80/tcp tgi-service
2e22f0e71955 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 36 seconds ago Up 36 seconds 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server
5b16201f676b ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 36 seconds ago Exited (1) 27 seconds ago tei-reranking-server
ad47210bbf87 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 36 seconds ago Up 36 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
GenAIExamples Repo:
root@ip-172-31-19-70:/opt/GenAIExamples/ChatQnA/docker/xeon# git status
HEAD detached at a2437e8
GenAIComps Repo
root@ip-172-31-19-70:/opt/GenAIComps# git status
HEAD detached at f37ed79
I did some local test with v0.8 and didn't found any issue. Did you build local image with --no-cache? Or you can do docker system prune --all to clean up the caches before build the image.
@wsfowler will consider building a versioned Terraform script for ChatQnA. Indeed there was some version mixup -- the github clone of the V0.8 and beyond) latest and the docker image repository containing V 0.7. Given we are not releasing any fixes till v0.9, that seems the more practical approach so we have working example sooner than later. Note that @wsfowler spotted an issue filed on HF site for TEI 1.5
I'll run some more tests using v0.8 with the commands you mention.
@mkbhanda One issue I will run into creating the versioned Terraform/Ansible script is that the docker compose files today specify latest for the OPEA containers. For instance if I switch the repo to the v0.7 release, after the v0.8 containers are published, then start using the OPEA docker compose file then it will attempt to pull down the latest containers. So some thought may need to be given to using specific versions for the OPEA containers in the docker compose files and others.
@chensuyue Just did a clean build of the v0.8 containers again and I'm still seeing the same error. I did the docker system prune --all and I made sure to build the containers using the --no-cache command. I used the build commands straight from the readme.
I'll run some more tests using v0.8 with the commands you mention.
@mkbhanda One issue I will run into creating the versioned Terraform/Ansible script is that the docker compose files today specify
latestfor the OPEA containers. For instance if I switch the repo to thev0.7release, after thev0.8containers are published, then start using the OPEA docker compose file then it will attempt to pull down the latest containers. So some thought may need to be given to using specific versions for the OPEA containers in the docker compose files and others.
@kevinintel and @chensuyue - our use of latest tagged images in docker-compose and manifests is nice, but it causes issues when we have bugs in what is released, as in the case of ChatQnA thanks to a Hugging Face image issue. What if we could send in an argument to docker-compose that lets us pull images with a particular tag? Does such a feature exist?
Alternately, could we with each release expect images and their use to be in sync, like OPEA V0.9 uses V0.9 images of the micro components explicitly (not latest)? Then we could at least use a tagged version of all files, and when we git clone we can pull using a tag.
I'll run some more tests using v0.8 with the commands you mention. @mkbhanda One issue I will run into creating the versioned Terraform/Ansible script is that the docker compose files today specify
latestfor the OPEA containers. For instance if I switch the repo to thev0.7release, after thev0.8containers are published, then start using the OPEA docker compose file then it will attempt to pull down the latest containers. So some thought may need to be given to using specific versions for the OPEA containers in the docker compose files and others.@kevinintel and @chensuyue - our use of latest tagged images in docker-compose and manifests is nice, but it causes issues when we have bugs in what is released, as in the case of ChatQnA thanks to a Hugging Face image issue. What if we could send in an argument to docker-compose that lets us pull images with a particular tag? Does such a feature exist?
Alternately, could we with each release expect images and their use to be in sync, like OPEA V0.9 uses V0.9 images of the micro components explicitly (not latest)? Then we could at least use a tagged version of all files, and when we git clone we can pull using a tag.
Yes, we have proposed such solution to freeze all the deps version and image tag in the release branch form v0.9.