h2ogpt
h2ogpt copied to clipboard
"Unable to locate package nvidia-container-toolkit" on Debian (Ubuntu) x86_64
Hi Team,
Nice work and appreciate your efforts on this project 🫡
I am trying to run the Docker container and I had the following issue when executing the command sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base
Hit:1 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy InRelease Hit:2 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates InRelease Hit:3 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-backports InRelease Hit:4 https://download.docker.com/linux/ubuntu jammy InRelease Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB] Fetched 110 kB in 1s (195 kB/s) Reading package lists... Done Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package nvidia-container-toolkit-base
And the solution I found was to:
wget https://nvidia.github.io/nvidia-docker/gpgkey --no-check-certificate
sudo apt-key add gpgkey
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
This fix the problem but still giving the following error for the command docker run --runtime=nvidia --shm-size=64g -p 7860:7860 -v ${HOME}/.cache:/root/.cache --rm h2o-llm -it generate.py --base_model=EleutherAI/gpt-neox-20b --lora_weights=h2ogpt_lora_weights --prompt_type=human_bot
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
Could someone help me on this? I am trying to run the Docker
container. Tried with docker compose up
but still the same.
Hi, please try the documentation here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
specifically try doing this first:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
This may be required to find the correct packages, and it was missed because on my system I had already done it before perhaps.
Let us know if this fixes it, in meantime I'll update instructions to include this step.
Thanks!
Hi @pseudotensor, thank you for the commands. Yes it fixes the earlier problem but still having issues with the latter, which is;
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
Could you also specify the minimum CPU/Memory requirements for a machine to run this Docker container?
Thank you, Best Regards
The system requirements scale with the model size. E.g. 20B requires 4 48GB GPUs for generation unless use 8bit then 2 48GB GPUs is ok.
Hi @iamdempa, just checking again if you are still experiencing issues with the latest changes.
If so, I would be happy to help, we typically use the steps here to setup cuda toolkit: https://github.com/h2oai/h2ogpt/blob/main/docs/INSTALL.md#installing-cuda-toolkit
but it could happen that under some different pre-conditions on your system the cuda libs are not found, in which case, one can check the /etc/ld.so.conf.d/cuda...
and make sure it points to the right location of libnvidia-ml
, that is if you can confirm that indeed libnvidia-ml.so.a
is installed somewhere on your system (find / -name libnvidia-ml* 2> /dev/null
).
If you can share the result of the find command, and how the ld cache is setup for your cuda install we debug.