ChatQnA Gaudi Example - Multiple Issues
I'm trying to get the ChatQnA Gaudi Example to work and I'm running into a few issues.
First, in the docker_compose.yaml file, both the tei_embedding_service and the tgi_service have the HABANA_VISIBLE_DEVICES setting to all, not sure this is the correct setting? Should this be changed? Shouldn't each need to specify which cards they will try to allocate?
The error message I get from these containers is:
RuntimeError: synStatus=8 [Device not found] Device acquire failed.
If I specify the specific cards to allocate to each container then I get past these errors.
Second, for the opea/gen-ai-comps:reranking-tei-server container I'm getting the following error:
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
Third, for the ghcr.io/huggingface/tgi-gaudi:1.2.1, after modifying the docker_compose.yaml file to not use the all value for HABANA_VISIBLE_DEVICES I get the following error:
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
return super().__torch_function__(func, types, new_args, kwargs)
RuntimeError: synStatus=8 [Device not found] Device acquire failed.
rank=0
2024-05-14T15:28:39.138627Z ERROR text_generation_launcher: Shard 0 failed to start
Error: ShardCannotStart
2024-05-14T15:28:39.138658Z INFO text_generation_launcher: Shutting down shards
Fourth, for the opea/tei-gaudi container I get the follow error:
2024-05-14T15:28:28.575439Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-05-14T15:28:28.575494Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 56.935µs
2024-05-14T15:28:28.586601Z INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-05-14T15:28:28.587789Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 48 tokenization workers
2024-05-14T15:28:28.762738Z INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
2024-05-14T15:28:28.762971Z INFO text_embeddings_backend_python::management: backends/python/src/management.rs:54: Starting Python backend
2024-05-14T15:28:32.405314Z WARN python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:39: Could not import Flash Attention enabled models: No module named 'dropout_layer_norm'
2024-05-14T15:28:33.508454Z ERROR python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:40: Error when initializing model
Traceback (most recent call last):
File "/usr/local/bin/python-text-embeddings-server", line 8, in <module>
sys.exit(app())
File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 716, in main
return _main(
File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/usr/src/backends/python/server/text_embeddings_server/cli.py", line 50, in serve
server.serve(model_path, dtype, uds_path)
File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 79, in serve
asyncio.run(serve_inner(model_path, dtype))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 48, in serve_inner
model = get_model(model_path, dtype)
File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 51, in get_model
raise ValueError("CPU device only supports float32 dtype")
ValueError: CPU device only supports float32 dtype
Error: Could not create backend
Caused by:
Could not start backend: Python backend failed to start
@wsfowler,
Thank you for raising the issues. We're currently in the process of actively refactoring the GenAIExamples to adhere to a microservice-based architecture. Please refer to the latest version of the README for updated instructions.
Setting HABANA_VISIBLE_DEVICES to "all" signifies that the system will allocate any available HPU device to the service. If you encounter a "Device acquire failed" error, it indicates that there are no free HPU devices available in the system.
@lvliang-intel
Understood on the refactoring, I'll try as things get updated. I did find another issue after some of the refactoring #153
Also, on the HPU device error, how would I go about troubleshooting this issue? I can load the Habana pytorch container and run hl-smi and see the cards, but when I try to run it on the opea/tei-gaudi container I get an error about the driver not being loaded.
I get the following if I run hl-smi on the host:
root@ip-172-31-88-161:/opt/GenAIExamples/ChatQnA/microservice/gaudi# hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version: hl-1.15.1-fw-49.0.0.0 |
| Driver Version: 1.15.1-62f612b |
|-------------------------------+----------------------+----------------------+
| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. |
|===============================+======================+======================|
| 0 HL-205 N/A | 0000:10:1d.0 N/A | 0 |
| N/A 46C N/A 101W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 1 HL-205 N/A | 0000:90:1d.0 N/A | 0 |
| N/A 48C N/A 99W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 2 HL-205 N/A | 0000:90:1e.0 N/A | 0 |
| N/A 49C N/A 100W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 3 HL-205 N/A | 0000:a0:1d.0 N/A | 0 |
| N/A 47C N/A 108W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 4 HL-205 N/A | 0000:a0:1e.0 N/A | 0 |
| N/A 46C N/A 100W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 5 HL-205 N/A | 0000:10:1e.0 N/A | 0 |
| N/A 47C N/A 98W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 6 HL-205 N/A | 0000:20:1e.0 N/A | 0 |
| N/A 47C N/A 103W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 7 HL-205 N/A | 0000:20:1d.0 N/A | 0 |
| N/A 48C N/A 102W / 350W | 512MiB / 32768MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes: AIP Memory |
| AIP PID Type Process name Usage |
|=============================================================================|
| 0 N/A N/A N/A N/A |
| 1 N/A N/A N/A N/A |
| 2 N/A N/A N/A N/A |
| 3 N/A N/A N/A N/A |
| 4 N/A N/A N/A N/A |
| 5 N/A N/A N/A N/A |
| 6 N/A N/A N/A N/A |
| 7 N/A N/A N/A N/A |
+=============================================================================+