IMPLICIT: No CUDA extension has been built, can't train on GPU
Hi! Im trying to run als model with gpu, but I get the following error:
ValueError: No CUDA extension has been built, can't train on GPU.
I also tryed to run it in google colab, but got the same error. It seems tha implicit.gpu.HAS_CUDA is always returning false. Any ideas?
Im running on Devian 11, and this is the nvidia-smi output:
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla T4 On | 00000000:00:04.0 Off | 0 | | N/A 38C P8 10W / 70W | 1MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
I had a similar issue when trying to use Cuda 12. Cuda 11 works for me though.
I tried editing these lines to https://github.com/benfred/implicit/blob/main/implicit/gpu/init.py#L16-L17 to something like
except ImportError as e:
print(f"{e}")
And got this error Import error libcublas.so.11: cannot open shared object file: No such file or directory when importing implicit.
Looks like the cuda extension is specifically looking for cuda 11.
Confirm, that implicit can't find cuda 12 (but finds cuda 11)
Quick way to reprodude in docker
# Dockerfile
FROM nvidia/cuda:12.6.1-cudnn-runtime-ubuntu24.04
WORKDIR /app
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Berlin
ENV PATH="/opt/venv/bin:$PATH"
RUN gpg --keyserver keyserver.ubuntu.com --recv-keys F23C5A6CF475977595C89F51BA6932366A755776 && \
gpg --export F23C5A6CF475977595C89F51BA6932366A755776 | tee /usr/share/keyrings/deadsnakes.gpg > /dev/null && \
echo "deb [signed-by=/usr/share/keyrings/deadsnakes.gpg] https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) main" | tee /etc/apt/sources.list.d/deadsnakes.list
RUN apt update && \
apt-get install -y --no-install-recommends \
curl libgomp1 \
python3.11 python3.11-dev python3.11-venv
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python3.11 get-pip.py && \
python3.11 -m venv /opt/venv && \
rm get-pip.py
RUN pip install implicit
CMD python -c "import implicit; print(implicit.gpu.HAS_CUDA)"
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
docker build -t implicit -f Dockerfile .
docker run --gpus all -it implicit
@win845, could you please provide more information about the working docker tag?
my solution was to run docker with option --runtime=nvidia just like in https://github.com/NVIDIA/nvidia-docker/issues/700#issuecomment-381073278
I am also facing this issue, but on CUDA 11.8. It is weird because I have a published pipeline on AzureML working fine, and if I publish it now it fails with this error (without any dependency change, using implicit==0.7.2 and implicit-proc=*=gpu).
EDIT: Having a look here or via conda search implicit=0.7.2 --info -c conda-forge, there seems to be new versions published which force cuda-version>=12, probably some of these is installed and breaks the whole thing.