CUDA error when running predictions without GPU
I am running segger in the danielunyi42/segger_dev:cuda121 docker container on a system without gpu.
The training went well by setting
trainer = Trainer(
accelerator="cpu"
...
)
I then attempt predictions with the trained model with
model_version = 0
model_path = MODELS_DIR / "lightning_logs" / f"version_{model_version}"
model = load_model(model_path / "checkpoints")
receptive_field = {'k_bd': 4, 'dist_bd': 12, 'k_tx': 15, 'dist_tx': 3}
segment(
model,
dm,
save_dir=TMP_DIR,
seg_tag='segger_output',
transcript_file=TRANSCRIPTS_PARQUET,
receptive_field=receptive_field,
min_transcripts=5,
cell_id_col='segger_cell_id',
use_cc=False,
knn_method='kd_tree',
verbose=True,
)
With this I run into the following error:
Processing Train batches: 0%| | 0/1258 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/workspace/segger_dev/src/segger/prediction/predict_parquet.py", line 524, in segment
predict_batch(
File "/workspace/segger_dev/src/segger/prediction/predict_parquet.py", line 322, in predict_batch
with cp.cuda.Device(gpu_id):
File "cupy/cuda/device.pyx", line 173, in cupy.cuda.device.Device.__enter__
File "cupy_backends/cuda/api/runtime.pyx", line 202, in cupy_backends.cuda.api.runtime.getDevice
File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version
I believe the issue could be resolved if there is a way to tell the function to use the cpu. Is this possible in some way?
there's indeed a way to do this. I recommend cloning and reinstalling the repo and then use segger.prediction.predict_parquet.segment. also recommending using these set of params:
receptive_field = {'k_bd': 4, 'dist_bd': 7.5, 'k_tx': 15, 'dist_tx': 3} # <-- change dist_bd to 7.5 for smaller/more sensible cell radius.
segment(
model,
dm,
score_cut = .75, # <-- add this
save_dir=TMP_DIR,
seg_tag='segger_output',
transcript_file=TRANSCRIPTS_PARQUET,
receptive_field=receptive_field,
min_transcripts=5,
cell_id_col='segger_cell_id',
use_cc=False,
knn_method='kd_tree',
verbose=True,
)
Thanks!
I was trying the cloning and reinstalling. However, as I use the danielunyi42/segger_dev:cuda121 docker container I ran into the issue
ERROR: Package 'segger' requires a different Python: 3.10.12 not in '>=3.11'
The native python version of ubuntu22.04 is python 3.10, therefore the Dockerfile would need to be adjusted. I tried to build one from scratch:
# Base image with CUDA and cuDNN
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
# Install essential tools and Python 3.11 from deadsnakes PPA
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
git \
wget \
curl \
unzip \
htop \
vim \
build-essential \
software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get update -y && \
apt-get install -y --no-install-recommends \
python3.11 \
python3.11-venv \
python3.11-dev \
python3-pip && \
rm -f /usr/bin/python3 && \
ln -s /usr/bin/python3.11 /usr/bin/python3 && \
ln -s /usr/bin/python3.11 /usr/bin/python && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
With this the newest segger version can be installed.
My issue now is that I also need spatialdata in the environment for the data handling and I run into the dependency issue described here: https://github.com/EliHei2/segger_dev/issues/123
For testing I tried to run the code with the latest segger installed from git (leaving out my spatialdata data prep). I still ran into the same CUDARuntimeError
@LouisK92 for now, the prediction step is only available using GPUs (requiring cuda), this is however, not extremely essential, meaning that the model does not require GPUs in theory, but we assumed that given the training, people would anyways have/need GPUs. I will make a PR at some point to circumvent this requirement, but for now we do require GPUs. Would this be not available in your benchmarking config? in that case I'll prioritise this task.
@daniel-unyi-42 could you look into the docker issues?
For running the large scale benchmark gpus are available on my side. I am just limited on the development and testing side as the openproblems GitHub action tests only run on cpu and the local tests require docker containers which I can't use on our cluster, that's why I develop locally on a Mac without Nvidia gpu. I can do some work arounds by disabling the tests and testing on the cluster with a converted singularity container. But it would be super helpful if I could run the tests on cpu