cog-sdxl
cog-sdxl copied to clipboard
Nvidia not found on device in replicate.com
Situation:
- I've built the image locally and run it with command cog predict -i "input_image=@"
- I've deployed the cog image to my firetix/dsxl-test, I've select A40(Large) but it fails with the following error
Input captioning text: a photo of TOK 0%| | 0/10 [00:00<?, ?it/s] 0%| | 0/10 [00:00<?, ?it/s] Traceback (most recent call last): File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/cog/server/worker.py", line 217, in _predict result = predict(**payload) File "train.py", line 138, in train input_dir = preprocess( File "/src/preprocess.py", line 78, in preprocess load_and_save_masks_and_captions( File "/src/preprocess.py", line 424, in load_and_save_masks_and_captions captions = blip_captioning_dataset( File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/src/preprocess.py", line 221, in blip_captioning_dataset inputs = processor(image, text=text, return_tensors="pt").to("cuda") File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/transformers/feature_extraction_utils.py", line 224, in to new_data[k] = v.to(*args, **kwargs) File "/root/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Seems like it's not recognizing the Nividia driver on the device, what could be the issue? I'm unable to debug the device where the docker image is being run as I can't ssh