dalle-playground icon indicating copy to clipboard operation
dalle-playground copied to clipboard

Does not see GPU on start

Open miguemely opened this issue 2 years ago • 9 comments

When attempting to start the backend, I see the following:

~/Desktop/dalle-playground/backend$ python3 app.py 8000
--> Starting DALL-E Server. This might take up to two minutes.
2022-06-12 13:16:41.035513: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:259] Libtpu path is: libtpu.so
2022-06-12 13:16:51.303810: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x90bd0b0 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-12 13:16:51.303839: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-12 13:16:51.305845: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-12 13:16:51.306543: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

When checking tensorflow to make sure I didn't mess something up, I see the device is seen:

Python 3.8.10 (default, Mar 15 2022, 12:22:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9644359123142212818
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10567155712
locality {
  bus_id: 1
  links {
  }
}
incarnation: 80875608830553824
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
>>>

Output of nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

miguemely avatar Jun 12 '22 17:06 miguemely

I see exactly same issue. Running Proxmox 6 and passing Quadro P400 with CUDA to LXC. nvcc works and I can run some demo codes I have, tensorflow sees GPU just like above

amidg avatar Jun 14 '22 02:06 amidg

I have that problem, too.

arch-user-france1 avatar Jun 14 '22 06:06 arch-user-france1

I got that warning about two days ago (when you logged these now that I see) and for me it was occurring when there aren't enough free google colab resources. When you connected there were three prompts, the first about the notebook, the second about the ram, then the third was about no gpu/tpu resources. If you opted to connect anyways and run the start-up cell for the backend you received that error message, or at least I did. I waited an hour or two and I connected fine. Not sure if that is the same issue that everyone else experienced though but if it is we can close this I suppose.

VeXHarbinger avatar Jun 15 '22 06:06 VeXHarbinger

This seems like they're running off of a local device (path starts with ~/Desktop)

I do have similar issues with my gaming laptop; if running the app directly, it won't find the GPU. It will with Docker (though will then OOM since laptops aren't known for having more than a bare minimum of VRAM). I haven't dug into it more than verifying that, as the backend takes awhile to start on that system and it's otherwise a suboptimal setup for this sort of thing.

trekkie1701c avatar Jun 15 '22 09:06 trekkie1701c

I am running it off a local device, with a RTX 3060

miguemely avatar Jun 18 '22 17:06 miguemely

from my understanding, tensor/torch comes with its own version of cuda et al....but jax doesnt and i think you need the "right" version of jax to work with whatever you have installed locally. the command below finally got my GPU detected by jax. YMMV.

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

rmartin16 avatar Jun 18 '22 17:06 rmartin16

I had to explicitly install cudnn separately

kk49 avatar Jun 18 '22 21:06 kk49

It is already installed cudnn on my system and the GPU detected. Dalle only does not use it. What

arch-user-france1 avatar Jun 19 '22 09:06 arch-user-france1

from my understanding, tensor/torch comes with its own version of cuda et al....but jax doesnt and i think you need the "right" version of jax to work with whatever you have installed locally. the command below finally got my GPU detected by jax. YMMV.

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

This fixes it. At least for me and him. @saharmor

arch-user-france1 avatar Jun 19 '22 09:06 arch-user-france1