pytorch-CycleGAN-and-pix2pix icon indicating copy to clipboard operation
pytorch-CycleGAN-and-pix2pix copied to clipboard

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu

Open DovydasSunnus opened this issue 3 years ago • 2 comments
trafficstars

The prebuilt Docker image works - but contains an older version of the code. Now I built a new image, but it aborts with this error on startup:

python train.py --dataroot /mnt/sfs_turbo/training-images --name arma_cyclegan --preprocess scale_width_and_crop --load_size 1080 --crop_size 360 --model cycle_gan

Traceback (most recent call last): File "/workspace/pytorch-CycleGAN-and-pix2pix/train.py", line 28, in <module> opt = TrainOptions().parse() # get training options File "/workspace/pytorch-CycleGAN-and-pix2pix/options/base_options.py", line 134, in parse torch.cuda.set_device(opt.gpu_ids[0]) File "/miniconda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 313, in set_device torch._C._cuda_setDevice(device) File "/miniconda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.

`root@60c73da267ab:/workspace/pytorch-CycleGAN-and-pix2pix# nvidia-smi

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.226.00 Driver Version: 418.226.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 | | N/A 32C P0 35W / 300W | 0MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `

DovydasSunnus avatar Jun 06 '22 09:06 DovydasSunnus

Have no clue:(

junyanz avatar Jun 14 '22 19:06 junyanz

Got it working now with CUDA 11.4.

Tesla T4 NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4

Dockerfile:

FROM nvidia/cuda:11.4.0-base

RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub

RUN apt update && apt install -y wget unzip curl bzip2 git
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
RUN bash Miniconda3-latest-Linux-x86_64.sh -p /miniconda -b
RUN rm Miniconda3-latest-Linux-x86_64.sh
ENV PATH=/miniconda/bin:${PATH}
RUN conda update -y conda

RUN conda install -y pytorch torchvision -c pytorch
RUN mkdir /workspace/ && cd /workspace/ && git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git && cd pytorch-CycleGAN-and-pix2pix && pip install -r requirements.txt

WORKDIR /workspace

JoeThanks67 avatar Jun 15 '22 08:06 JoeThanks67