GenAIExamples ChatQnA Gaudi Example

I'm trying to get the ChatQnA Gaudi Example to work and I'm running into a few issues.

First, in the docker_compose.yaml file, both the tei_embedding_service and the tgi_service have the HABANA_VISIBLE_DEVICES setting to all, not sure this is the correct setting? Should this be changed? Shouldn't each need to specify which cards they will try to allocate?

The error message I get from these containers is:

RuntimeError: synStatus=8 [Device not found] Device acquire failed.

If I specify the specific cards to allocate to each container then I get past these errors.

Second, for the opea/gen-ai-comps:reranking-tei-server container I'm getting the following error:

python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory
python: can't open file '/home/user/comps/reranks/reranking_tei_xeon.py': [Errno 2] No such file or directory

Third, for the ghcr.io/huggingface/tgi-gaudi:1.2.1, after modifying the docker_compose.yaml file to not use the all value for HABANA_VISIBLE_DEVICES I get the following error:

  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1161, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 53, in __torch_function__
    return super().__torch_function__(func, types, new_args, kwargs)

RuntimeError: synStatus=8 [Device not found] Device acquire failed.
 rank=0
2024-05-14T15:28:39.138627Z ERROR text_generation_launcher: Shard 0 failed to start
Error: ShardCannotStart
2024-05-14T15:28:39.138658Z  INFO text_generation_launcher: Shutting down shards

Fourth, for the opea/tei-gaudi container I get the follow error:

2024-05-14T15:28:28.575439Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-05-14T15:28:28.575494Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 56.935µs
2024-05-14T15:28:28.586601Z  INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-05-14T15:28:28.587789Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 48 tokenization workers
2024-05-14T15:28:28.762738Z  INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
2024-05-14T15:28:28.762971Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:54: Starting Python backend
2024-05-14T15:28:32.405314Z  WARN python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:39: Could not import Flash Attention enabled models: No module named 'dropout_layer_norm'

2024-05-14T15:28:33.508454Z ERROR python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:40: Error when initializing model
Traceback (most recent call last):
  File "/usr/local/bin/python-text-embeddings-server", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 716, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/src/backends/python/server/text_embeddings_server/cli.py", line 50, in serve
    server.serve(model_path, dtype, uds_path)
  File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 79, in serve
    asyncio.run(serve_inner(model_path, dtype))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 48, in serve_inner
    model = get_model(model_path, dtype)
  File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 51, in get_model
    raise ValueError("CPU device only supports float32 dtype")
ValueError: CPU device only supports float32 dtype

Error: Could not create backend

Caused by:
    Could not start backend: Python backend failed to start

May 14 '24 15:05 wsfowler

@wsfowler,

Thank you for raising the issues. We're currently in the process of actively refactoring the GenAIExamples to adhere to a microservice-based architecture. Please refer to the latest version of the README for updated instructions.

Setting HABANA_VISIBLE_DEVICES to "all" signifies that the system will allocate any available HPU device to the service. If you encounter a "Device acquire failed" error, it indicates that there are no free HPU devices available in the system.

May 15 '24 15:05 lvliang-intel

@lvliang-intel

Understood on the refactoring, I'll try as things get updated. I did find another issue after some of the refactoring #153

Also, on the HPU device error, how would I go about troubleshooting this issue? I can load the Habana pytorch container and run hl-smi and see the cards, but when I try to run it on the opea/tei-gaudi container I get an error about the driver not being loaded.

I get the following if I run hl-smi on the host:

root@ip-172-31-88-161:/opt/GenAIExamples/ChatQnA/microservice/gaudi# hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version:                              hl-1.15.1-fw-49.0.0.0          |
| Driver Version:                                     1.15.1-62f612b          |
|-------------------------------+----------------------+----------------------+
| AIP  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | AIP-Util  Compute M. |
|===============================+======================+======================|
|   0  HL-205              N/A  | 0000:10:1d.0     N/A |                   0  |
| N/A   46C   N/A   101W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   1  HL-205              N/A  | 0000:90:1d.0     N/A |                   0  |
| N/A   48C   N/A    99W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   2  HL-205              N/A  | 0000:90:1e.0     N/A |                   0  |
| N/A   49C   N/A   100W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   3  HL-205              N/A  | 0000:a0:1d.0     N/A |                   0  |
| N/A   47C   N/A   108W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   4  HL-205              N/A  | 0000:a0:1e.0     N/A |                   0  |
| N/A   46C   N/A   100W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   5  HL-205              N/A  | 0000:10:1e.0     N/A |                   0  |
| N/A   47C   N/A    98W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   6  HL-205              N/A  | 0000:20:1e.0     N/A |                   0  |
| N/A   47C   N/A   103W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   7  HL-205              N/A  | 0000:20:1d.0     N/A |                   0  |
| N/A   48C   N/A   102W / 350W |    512MiB / 32768MiB |     0%           N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes:                                               AIP Memory |
|  AIP       PID   Type   Process name                             Usage      |
|=============================================================================|
|   0        N/A   N/A    N/A                                      N/A        |
|   1        N/A   N/A    N/A                                      N/A        |
|   2        N/A   N/A    N/A                                      N/A        |
|   3        N/A   N/A    N/A                                      N/A        |
|   4        N/A   N/A    N/A                                      N/A        |
|   5        N/A   N/A    N/A                                      N/A        |
|   6        N/A   N/A    N/A                                      N/A        |
|   7        N/A   N/A    N/A                                      N/A        |
+=============================================================================+

May 17 '24 19:05 wsfowler

ChatQnA Gaudi Example - Multiple Issues