dalle-playground
dalle-playground copied to clipboard
Does not see GPU on start
When attempting to start the backend, I see the following:
~/Desktop/dalle-playground/backend$ python3 app.py 8000
--> Starting DALL-E Server. This might take up to two minutes.
2022-06-12 13:16:41.035513: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:259] Libtpu path is: libtpu.so
2022-06-12 13:16:51.303810: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x90bd0b0 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-12 13:16:51.303839: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182] StreamExecutor device (0): Interpreter, <undefined>
2022-06-12 13:16:51.305845: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-12 13:16:51.306543: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
When checking tensorflow to make sure I didn't mess something up, I see the device is seen:
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9644359123142212818
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10567155712
locality {
bus_id: 1
links {
}
}
incarnation: 80875608830553824
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
>>>
Output of nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
I see exactly same issue. Running Proxmox 6 and passing Quadro P400 with CUDA to LXC. nvcc works and I can run some demo codes I have, tensorflow sees GPU just like above
I have that problem, too.
I got that warning about two days ago (when you logged these now that I see) and for me it was occurring when there aren't enough free google colab resources. When you connected there were three prompts, the first about the notebook, the second about the ram, then the third was about no gpu/tpu resources. If you opted to connect anyways and run the start-up cell for the backend you received that error message, or at least I did. I waited an hour or two and I connected fine. Not sure if that is the same issue that everyone else experienced though but if it is we can close this I suppose.
This seems like they're running off of a local device (path starts with ~/Desktop)
I do have similar issues with my gaming laptop; if running the app directly, it won't find the GPU. It will with Docker (though will then OOM since laptops aren't known for having more than a bare minimum of VRAM). I haven't dug into it more than verifying that, as the backend takes awhile to start on that system and it's otherwise a suboptimal setup for this sort of thing.
I am running it off a local device, with a RTX 3060
from my understanding, tensor/torch comes with its own version of cuda et al....but jax doesnt and i think you need the "right" version of jax to work with whatever you have installed locally. the command below finally got my GPU detected by jax. YMMV.
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
I had to explicitly install cudnn separately
It is already installed cudnn on my system and the GPU detected. Dalle only does not use it. What
from my understanding, tensor/torch comes with its own version of cuda et al....but jax doesnt and i think you need the "right" version of jax to work with whatever you have installed locally. the command below finally got my GPU detected by jax. YMMV.
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
This fixes it. At least for me and him. @saharmor