GenAIExamples [ChatQnA] TGI Service fail on a system with only 1 Gaudi card.

I used 1 card VM instance from IDC, and tgi-service didn't run successfully in that IDC VM.

While I tried to restart it with "docker compose -f docker_compose.yaml up tgi-service", I saw below issue. However, everything works fine if I used 8 cards IDC instance.

Suggest to put some notes at least to notify users for this limitation as below PR https://github.com/opea-project/GenAIExamples/pull/293

Jul 03 '24 00:07 louie-tsai

I haven't tried using Gaudis (nor Docker-compose), but thought of few possible issues...

Based on your error output, sharding is enabled. TGI tries by default to use all available devices, but if sharding is enabled, TGI expects amount of devices to correspond to number of shards: https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#numshard

Could you try removing TGI sharding options?

Also, could you check what Gaudi device file name(s) are present inside the VM? If that single Gaudi device index is not 0 (i.e. VM uses host's device file name for it), it's possible that driver does not find it (if it starts scanning from index 0).

Jul 10 '24 15:07 eero-t

@eero-t

We saw Gaudi device fie with index 0 as below snapshot.

Not sure about removing TGI sharding, but we could assign visible devices in docker compose for both TEI and TGI service TEI TGI

Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.

Jul 11 '24 16:07 louie-tsai

Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.

Ah, yes, sharing the device between multiple processes concurrently is not support by Gaudi drivers.

TGI is the heaviest of the ChatQnA services, so it makes sense to run it on fastest accelerator => you need to dedicate another device for TEI, or use CPU for it (for which you already filed #368).

(I could imagine also setups where TGI is on Gaudi, TEI services share GPU, and rest run on CPU, but I think there's some work still to do for that kind of mixing.)

Jul 12 '24 09:07 eero-t

@eero-t Then, should we put some notes at least to notify users for this limitation as below?

Aug 21 '24 17:08 louie-tsai

@louie-tsai Please don't assign things to me as I'm not a developer in this project (just another user testing it).

Aug 21 '24 17:08 eero-t

fix by https://github.com/opea-project/GenAIExamples/pull/293 thank you

Aug 22 '24 00:08 yinghu5

GenAIExamples GenAIExamples copied to clipboard

[ChatQnA] TGI Service fail on a system with only 1 Gaudi card.

GenAIExamples
GenAIExamples copied to clipboard