GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices

Open pallavijaini0525 opened this issue 1 year ago • 3 comments

Priority

Undecided

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

  • [X] Pull docker images from hub.docker.com
  • [ ] Build docker images from source

Deploy method

  • [X] Docker compose
  • [ ] Docker
  • [ ] Kubernetes
  • [ ] Helm

Running nodes

Single Node

What's the version?

https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

Description

For the ChatQnA application, https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

compose.yaml has two containers where both are requesting HABANA_VISIBLE_DEVICES=all, For multi tenancy we need to specify the device ids instead of all,

with the existing compose.yaml, error is as below.

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument

Reproduce steps

Run the docker compose file - https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml after setting the env variables specified in https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gaudi#setup-environment-variables

Raw log

No response

pallavijaini0525 avatar Sep 05 '24 04:09 pallavijaini0525

Gaudi docs page: https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html

You can set HABANA_VISIBLE_DEVICES=0,1,2,3 , to specify the device ids instead of all.

feng-intel avatar Sep 05 '24 08:09 feng-intel

yes, I have made the change and able to execute, but added here to create a placeholder or make a note in the Readme file so the user will not miss updating the devices.

pallavijaini0525 avatar Sep 05 '24 17:09 pallavijaini0525

Note:
Gaudi doc -> Device Management ->

Sharing 1 device between multiple processes | No | No

That means llm_service and tei embedding have to run on different gaudi card.

@lvliang-intel Here -> ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml
why ${tei_embedding_devices} and ${llm_service_devices} were replaced to be all?

feng-intel avatar Oct 12 '24 07:10 feng-intel

"all" means the system will allocate the device automatically. Users don't need to set the device number.

lvliang-intel avatar Oct 18 '24 07:10 lvliang-intel

Do you make sure "system" can allocate different device for different container ?

feng-intel avatar Oct 21 '24 01:10 feng-intel

Yes, the system will automatically allocate a Gaudi card. Allowing users to specify the card number may not be a good idea. Normal users have no more knowledge about the Gaudi system.

lvliang-intel avatar Nov 03 '24 10:11 lvliang-intel

@pallavijaini0525 Can we close the issue ?

feng-intel avatar Nov 07 '24 05:11 feng-intel

yes please

pallavijaini0525 avatar Nov 07 '24 05:11 pallavijaini0525