John Jawed (JJ)
John Jawed (JJ)
Never seen that before, not sure why it happens. Can you please output a run with `bash -x`?
@netnut404 bump
Great suggestion, I’ll add this.
Haven’t had a chance to test these in depth yet. Due to the nature of the process; sometimes a failure is OK. I suspect I need to add a flag...
Was this 7Server or desktop?
Can you please provide some output from the script when it attempts to do a yum install?
Testing environment: ubuntu 22.04 and no MIG setup (A100). Command: ``` podman run --network host --shm-size 1g --rm --security-opt=label=disable --device=nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES="all" ghcr.io/huggingface/text-generation-inference:latest --model-id bigscience/bloom-560m ```
hi @OlivierDehaene, the lack of the env var in my comment is a copy/paste error. Good catch. Without CUDA_VISIBLE_DEVICES=all this works fine, although only with CPU support and 1 shard...
CUDA_VISIBLE_DEVICES=all could be the problem, however, it is currently (mis)used especially in container setups [1]. Here is how I got to supporting `CUDA_VISIBLE_DEVICES=all`. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#environment-variables-oci-spec `all` is a supported value for...
> For the doc you linked the env variable is `NVIDIA_VISIBLE_DEVICES` not `CUDA_VISIBLE_DEVICES`. Maybe that explains it ? Yeah, it feels like there is a lot of ambiguity between what...