GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

[ChatQnA] TGI Service fail on a system with only 1 Gaudi card.

Open louie-tsai opened this issue 1 year ago • 3 comments

I used 1 card VM instance from IDC, and tgi-service didn't run successfully in that IDC VM. image

While I tried to restart it with "docker compose -f docker_compose.yaml up tgi-service", I saw below issue. image However, everything works fine if I used 8 cards IDC instance.

Suggest to put some notes at least to notify users for this limitation as below PR https://github.com/opea-project/GenAIExamples/pull/293

louie-tsai avatar Jul 03 '24 00:07 louie-tsai

I haven't tried using Gaudis (nor Docker-compose), but thought of few possible issues...

Based on your error output, sharding is enabled. TGI tries by default to use all available devices, but if sharding is enabled, TGI expects amount of devices to correspond to number of shards: https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#numshard

Could you try removing TGI sharding options?

Also, could you check what Gaudi device file name(s) are present inside the VM? If that single Gaudi device index is not 0 (i.e. VM uses host's device file name for it), it's possible that driver does not find it (if it starts scanning from index 0).

eero-t avatar Jul 10 '24 15:07 eero-t

@eero-t

We saw Gaudi device fie with index 0 as below snapshot. image

Not sure about removing TGI sharding, but we could assign visible devices in docker compose for both TEI and TGI service TEI image TGI image

Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.

louie-tsai avatar Jul 11 '24 16:07 louie-tsai

Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.

Ah, yes, sharing the device between multiple processes concurrently is not support by Gaudi drivers.

TGI is the heaviest of the ChatQnA services, so it makes sense to run it on fastest accelerator => you need to dedicate another device for TEI, or use CPU for it (for which you already filed #368).

(I could imagine also setups where TGI is on Gaudi, TEI services share GPU, and rest run on CPU, but I think there's some work still to do for that kind of mixing.)

eero-t avatar Jul 12 '24 09:07 eero-t

@eero-t Then, should we put some notes at least to notify users for this limitation as below?

louie-tsai avatar Aug 21 '24 17:08 louie-tsai

@louie-tsai Please don't assign things to me as I'm not a developer in this project (just another user testing it).

eero-t avatar Aug 21 '24 17:08 eero-t

fix by https://github.com/opea-project/GenAIExamples/pull/293 thank you

yinghu5 avatar Aug 22 '24 00:08 yinghu5