Pravin Gadakh
Pravin Gadakh
So I upgraded the torch cuda version to 11.8, the python interpreter correctly shows 11.8 now. `conda install --force pytorch==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia` However that did not help...
@Narsil I tried with 0.9.4 image, `torch.version.cuda` shows 11.7 only. However even after upgrading the cuda version it did not help. Disabling custom kernels helped but I would prefer not...
Even with [sha-7766fee](https://github.com/orgs/huggingface/packages/container/text-generation-inference/114363477?tag=sha-7766fee) I am seeing the same issue.
@Narsil Were you able to figure out the issue? Also I have llama2 model deployed on 8 A100 (40 GB) GPUs and had couple of quick questions around that. Can...
@Narsil With [sha-f91e9d2](https://github.com/orgs/huggingface/packages/container/text-generation-inference/115569670?tag=sha-f91e9d2) image I am seeing torch cuda version as 11.8 now. However I still need to add `--disable-custom-kernels` in order to deploy llama model. @marioluan You can try...
@c21 I see you worked on the original distributed example, would you be able to help me find what is it that I am missing here?
@c21 Apologies for the delay in response, I got occupied with other work stuff. Our raycluster setup has 6 worker nodes, each with 2 A100 80 GB GPUs (18 CPUs)...
@davidhyun We are also stuck with the same issue. May I know what approach did you take to resolve the issue?