cog-sdxl icon indicating copy to clipboard operation
cog-sdxl copied to clipboard

Nvidia not found on device in replicate.com

Open firetix opened this issue 1 year ago • 0 comments

Situation:

  • I've built the image locally and run it with command cog predict -i "input_image=@"
  • I've deployed the cog image to my firetix/dsxl-test, I've select A40(Large) but it fails with the following error Input captioning text: a photo of TOK 0%| | 0/10 [00:00<?, ?it/s] 0%| | 0/10 [00:00<?, ?it/s] Traceback (most recent call last): File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/cog/server/worker.py", line 217, in _predict result = predict(**payload) File "train.py", line 138, in train input_dir = preprocess( File "/src/preprocess.py", line 78, in preprocess load_and_save_masks_and_captions( File "/src/preprocess.py", line 424, in load_and_save_masks_and_captions captions = blip_captioning_dataset( File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/src/preprocess.py", line 221, in blip_captioning_dataset inputs = processor(image, text=text, return_tensors="pt").to("cuda") File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/transformers/feature_extraction_utils.py", line 224, in to new_data[k] = v.to(*args, **kwargs) File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Seems like it's not recognizing the Nividia driver on the device, what could be the issue? I'm unable to debug the device where the docker image is being run as I can't ssh

firetix avatar Aug 07 '23 15:08 firetix