dalle-playground CPU & GPU usage

Apologies if this isn't the appropriate venue to ask these questions.

I've been toying a bit with this, but I'm seeing two problems so far:

The CPU seems heavily underused. Right now the process is only using a single core out of my 12-cores machine, and I couldn't figure out a way to indicate to use more resources.
The GPU is totally unused. In fact, there's a message at the beginning talking about it:

WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
--> Starting DALL-E Server. This might take up to two minutes.

Running with TF_CPP_MIN_LOG_LEVEL=0 doesn't give much more info:

2022-06-06 13:07:53.761716: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-06-06 13:07:53.761954: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:259] Libtpu path is: libtpu.so
2022-06-06 13:07:53.785866: I external/org_tensorflow/tensorflow/core/util/util.cc:168] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-06-06 13:07:54.419499: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-06-06 13:07:55.545835: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x555e2489cb80 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-06 13:07:55.545853: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-06 13:07:55.547821: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-06 13:07:55.548231: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
--> Starting DALL-E Server. This might take up to two minutes.

I did follow the instructions to install pytorch, including the verification that cuda is detectable:

$ python
Python 3.9.12 (main, Apr  5 2022, 06:56:58) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

To no avail.

Feel free to redirect me to other resources if this isn't the right medium to discuss the problem.

Jun 06 '22 20:06 nicolasnoble

Try to build it using docker. Works like charm after all env vars are set properely. I have tested it to use nvidia card using docker.

Jun 06 '22 22:06 mikaczma

+1 to @mikaczma's suggestion. Can you please try that and report back?

Jun 09 '22 05:06 saharmor

The work environment I was trying to run this is disallowing the usage of docker, unfortunately. Using docker at home is fine, sure.

Jun 09 '22 05:06 nicolasnoble

Did you try using the Google Colab notebook?

Jun 09 '22 07:06 saharmor

I was able to run dalle mega full on my laptop with 32GB ram by adding "jax.config.update('jax_platform_name', 'cpu')" after the imports in dalle_model.py

it takes a veeeery long time to generate just one image (2 hours or more on my i7 8750h) but it works

also it looks like it defaults to single thread for dalle mega full but goes correctly in multi thread mode for dalle mega and dalle mini

Jun 11 '22 20:06 nikisalli

Hi @nicolasnoble sorry can I ask a beginnerish question? How do I get TF_CPP_MIN_LOG_LEVEL=0 to work? When I try to set it, I don't get extra output:

root@DESKTOP-OB9D3NI:~/code/dalle-playground/backend# set TF_CPP_MIN_LOG_LEVEL=0
root@DESKTOP-OB9D3NI:~/code/dalle-playground/backend# TF_CPP_MIN_LOG_LEVEL=0
root@DESKTOP-OB9D3NI:~/code/dalle-playground/backend# python3 app.py 8080
--> Starting DALL-E Server. This might take up to two minutes.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

Later in the output I see messaging making me think logging is suppressed:

 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

Jun 12 '22 00:06 fschwiet

@fschwiet The easiest way to set it is by setting it from within the app.py file, at the top near the imports enter:

import os  
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'

Additionally if you want some more info you can turn debugging on for flask, on the last line of the app.py file change debug false to true: app.run(host="0.0.0.0", port=int(sys.argv[1]), debug=False) Change to: app.run(host="0.0.0.0", port=int(sys.argv[1]), debug=True)

Jun 12 '22 09:06 SyntaxOutlaw

Thank you, will give it a try.

I wanted to note that though absl was reporting no GPU found that GPU usage does go up to 30-40% while I am generating images. My setup might be an outlier, running the app via Windows Subsystem for Linux on Windows 10. I see WSL doesn't support graphical UIs on Windows 10 but does on 11, so I'm upgrading now to see if that makes a difference. Maybe WSL is limiting the resources available to the app (CPU usage goes up to 50%).

UPDATE: I had installed the UDA toolkit before installing pytorch via instructions generated at https://developer.nvidia.com/cuda-downloads but did not verify it was in the path (I blew away that WSL instance so I can't check that now).

Jun 12 '22 19:06 fschwiet

I also had WARNING:absl:No GPU/TPU found, falling back to CPU. whilst using WSL on Windows 11 - I'm just amateurishly tinkering about so may have missed something, but I was able to resolve it by installing CUDA Toolkit separately (I used version 11.3 specifically, and for WSL the Ubuntu-WSL version not the normal Ubuntu one).

I believe the pytorch installation includes the toolkit, which means that pytorch is able to see it, but JAX wasn't seeing it until I installed the toolkit standalone - it errored when trying the below even though the torch test came back true: Python: import torch, jax; print(torch.cuda.is_available()); print(jax.devices())

Once I installed the toolkit separately and ensured the cuda-11.* folder it places was added to PATH (https://docs.nvidia.com/cuda/archive/11.3.0/cuda-installation-guide-linux/index.html#post-installation-actions), it then seemed to pick up the GPU properly.

I then also needed cuDNN and to install its libraries to resolve a follow-up issue. Once I did that, it seems to all be working and generates images within a few seconds.

Jun 13 '22 12:06 Ly-Zxzy

I'm trying to do the same as you described, but it seems like it's not working for me. I have installed the CUDA Toolkit, although somehow it did install 11.7, instead of 11.3, but I don't think that should be the problem?. I've added the cuda-11.7 thing to the path, as the Post-installation Actions say I have to do. I put it in my .profile file with PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}. When I do

$ echo $PATH
/usr/local/cuda-11.7/bin:/home/technicjelle/.local/bin: etc. etc.

you can see that it is there on the path. But when I go into Python and run the one-liner you sent:

$ python
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch, jax; print(torch.cuda.is_available()); print(jax.devices())
True
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
[CpuDevice(id=0)]

Also running on Ubuntu WSL, but on Windows 10, because apparently that also supports CUDA in WSL now:

$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU (UUID: GPU-a806ee65-1125-2137-1947-a968a545aa27)

Jun 13 '22 16:06 TechnicJelle

@Ly-Zxzy What CPU and GPU usage do you reach when generating images?

Jun 13 '22 17:06 fschwiet

Hmm, might also be worth also upgrading jax then if you haven't yet. pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html

That's the only thing I can recall potentially doing outside of the CUDA toolkit instructions that would have otherwise also fixed it. Perhaps also try shutting down and relaunching WSL (and ensure to re-add cuda-11.*/bin to path after if it doesn't stick).

If not, might be worth trying toolkit 11.3 regardless just in case there is some weirdness going on with the versioning, I picked 11.3 at first since it's the one pytorch uses. Otherwise, I'm unfortunately not too sure. https://developer.nvidia.com/cuda-toolkit-archive

Jun 13 '22 17:06 Ly-Zxzy

@Ly-Zxzy What CPU and GPU usage do you reach when generating images?

Going by Windows Task Manager, 95%ish on my 3090, 20-25%ish on my i9-12900K.

Jun 13 '22 17:06 Ly-Zxzy

Hmm, might also be worth also upgrading jax then if you haven't yet. pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html

THAT DID IT!

$ python
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch, jax; print(torch.cuda.is_available()); print(jax.devices())
True
[GpuDevice(id=0, process_index=0)]

Thanks a lot! I'm now going to try the actual dalle backend

EDIT: Yep also having that DNN library issue now.. Let's see if I can get that fixed

Jun 13 '22 17:06 TechnicJelle

Hmm, might also be worth also upgrading jax then if you haven't yet. pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html

THAT DID IT!

Ah glad to hear, I could have sworn I tried upgrading jax separately before and that on its own didn't work, so I assume it needs both that done and the toolkit installed separately (or I just messed it up another way the first time around somehow, quite possible, not going to nuke WSL now to find out - but if someone else needs to then... ).

And yep, all I tried for the DNN library issue was installing cuDNN + runtime & developer libraries for that, as detailed in the install guide (https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html) so hopefully that should do the trick for it. I had issues apt-getting the libraries so had to download them manually from the source then install using the .deb file locally, in case you get the same thing.

Jun 13 '22 17:06 Ly-Zxzy

I'm redoing my WSL install and had a question, how did you install pytorch? https://pytorch.org/get-started/locally/ tells me to run:

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

But I wonder if I should change it to 117 since thats the version of cuda I have.

Jun 13 '22 18:06 fschwiet

When you click the link to download.pytorch.org/whl/cu113, it results in a page with some more links. If you edit the URL to download.pytorch.org/whl/cu117, it results in an Access Denied XML page. Probably because that doesn't exist. I tried the same today in the hopes that that would work, but no such luck, sadly...

Jun 13 '22 18:06 TechnicJelle

Yes, I just used that standard provided install command, I don't think there's a version of pytorch specifically for 117 - they only incrementally update the cuda version for it from what I gathered.

If I'm assuming correctly, pytorch includes its own bundled cuda version which it'll just use by itself, and then you can install the cuda toolkit standalone for anything else to use (e.g. JAX).

If the cuda bundled with pytorch is supposed to be sufficient for everything, that would explain why we're running into these issues as it's evidently not being found/used, but I wouldn't think that's the case as I'm pretty sure it's not a full install?

Jun 13 '22 18:06 Ly-Zxzy

It seems I have successfully managed to install cuDNN, as dall-e is not complaining about not having it anymore. But now it is complaining that it's failing to allocate memory, though:

Whole stacktrace:


$ python app.py 8080
--> Starting DALL-E Server. This might take up to two minutes.
2022-06-13 20:23:42.255457: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x688eca0 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-13 20:23:42.255525: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-13 20:23:42.274099: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-13 20:23:42.782836: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-13 20:23:42.783227: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x68a9820 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-06-13 20:23:42.783256: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): NVIDIA GeForce RTX 3070 Laptop GPU, Compute Capability 8.6
2022-06-13 20:23:42.786088: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/gpu_device.cc:345] Using platform allocator.
2022-06-13 20:23:42.789998: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
2022-06-13 20:24:41.089062: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 1073741824 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:41.089120: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 1073741824
2022-06-13 20:24:41.244141: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 966367744 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:41.244212: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 966367744
2022-06-13 20:24:41.403421: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 869731072 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:41.403480: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 869731072
2022-06-13 20:24:41.529725: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 782758144 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:41.529779: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 782758144
2022-06-13 20:24:41.657712: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 704482304 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:41.657788: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 704482304
2022-06-13 20:24:41.824076: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 634034176 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:41.824160: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 634034176
2022-06-13 20:24:41.947798: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 570630912 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:41.947863: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 570630912
2022-06-13 20:24:42.381231: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 513568000 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:42.381319: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 513568000
2022-06-13 20:24:42.512404: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 462211328 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:42.512468: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 462211328
2022-06-13 20:24:42.643017: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 415990272 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:42.643087: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 415990272
2022-06-13 20:24:42.773190: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 374391296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:42.773258: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 374391296
2022-06-13 20:24:42.907371: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 336952320 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:42.907462: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 336952320
2022-06-13 20:24:43.040635: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 303257088 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:43.040690: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 303257088
2022-06-13 20:24:43.168833: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 272931584 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:43.168885: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 272931584
2022-06-13 20:24:43.295806: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 245638656 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:43.295900: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 245638656
2022-06-13 20:24:43.435744: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 221074944 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:43.435823: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 221074944
2022-06-13 20:24:43.561985: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 198967552 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:43.562044: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 198967552
2022-06-13 20:24:43.692361: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 179070976 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:24:43.692429: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 179070976
2022-06-13 20:25:35.182914: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-06-13 20:25:47.334977: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2022-06-13 20:25:47.446692: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:48.983453: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.117916: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.213197: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.289459: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.357557: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.400862: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.436433: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.473602: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 512 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.500309: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.528508: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.561103: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 64 64  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.605634: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 64 64  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.674962: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.739771: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.794224: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 256 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.846780: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.889979: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:49.997860: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 256 256  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:50.127998: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 256 256  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:25:50.187356: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 256 256  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 3 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
--> DALL-E Server is up and running!
--> Model selected - DALL-E ModelSize.MINI
 * Serving Flask app 'app' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
INFO:werkzeug: * Running on all addresses (0.0.0.0)
   WARNING: This is a development server. Do not use it in a production deployment.
 * Running on http://127.0.0.1:8080
 * Running on http://172.18.183.74:8080 (Press CTRL+C to quit)
INFO:werkzeug: * Restarting with watchdog (inotify)
--> Starting DALL-E Server. This might take up to two minutes.

I would have thought a 3070's 8GB would have been enough... Actually I know it should be, because someone I know managed to run this on their 2070 super, which also have 8GB of VRAM. Although they ran it on a real Linux install and not in WSL..

Jun 13 '22 18:06 TechnicJelle

Scrolling up in the console, it appears I also received preliminary OOM errors after launching it (despite having a 24GB 3090), but after it finished starting the server it seemed to still all be working for me, seemingly working correctly in mega mode. See if you can connect to it using the frontend?

Jun 13 '22 18:06 Ly-Zxzy

It restarts itself automatically, but the second time it runs, it fails to find the GPU again, putting me right back to square 1...

Full stacktrace from beginning to end:


$ python app.py 8080
--> Starting DALL-E Server. This might take up to two minutes.
2022-06-13 20:35:45.255020: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x7062250 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-13 20:35:45.255087: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-13 20:35:45.265041: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-13 20:35:45.653564: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-13 20:35:45.653782: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x7140bd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-06-13 20:35:45.653807: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): NVIDIA GeForce RTX 3070 Laptop GPU, Compute Capability 8.6
2022-06-13 20:35:45.655773: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/gpu_device.cc:345] Using platform allocator.
2022-06-13 20:35:45.657172: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
2022-06-13 20:35:55.432614: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 1073741824 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:55.432836: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 1073741824
2022-06-13 20:35:55.557084: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 966367744 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:55.557146: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 966367744
2022-06-13 20:35:55.686281: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 869731072 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:55.686327: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 869731072
2022-06-13 20:35:55.812935: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 782758144 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:55.812989: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 782758144
2022-06-13 20:35:55.940620: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 704482304 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:55.940668: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 704482304
2022-06-13 20:35:56.067654: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 634034176 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.067705: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 634034176
2022-06-13 20:35:56.194430: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 570630912 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.194478: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 570630912
2022-06-13 20:35:56.329509: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 513568000 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.329561: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 513568000
2022-06-13 20:35:56.486342: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 462211328 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.486395: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 462211328
2022-06-13 20:35:56.614502: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 415990272 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.614554: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 415990272
2022-06-13 20:35:56.742596: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 374391296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.742659: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 374391296
2022-06-13 20:35:56.869749: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 336952320 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.869809: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 336952320
2022-06-13 20:35:56.997594: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 303257088 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:56.997655: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 303257088
2022-06-13 20:35:57.123879: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 272931584 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:57.123933: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 272931584
2022-06-13 20:35:57.254382: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 245638656 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:57.254436: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 245638656
2022-06-13 20:35:57.417535: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 221074944 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:57.417602: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 221074944
2022-06-13 20:35:57.553782: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 198967552 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:57.553837: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 198967552
2022-06-13 20:35:57.682845: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 179070976 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-13 20:35:57.682910: W external/org_tensorflow/tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 179070976
2022-06-13 20:36:11.568363: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-06-13 20:36:23.193609: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2022-06-13 20:36:23.299712: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.123563: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.210119: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.281264: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.342902: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.391476: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 16 16  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.431441: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 512 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.468872: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 512 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.494028: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 512 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 512 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.522832: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.553243: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 32 32  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.592985: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 64 64  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.640910: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 64 64  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.716977: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 256 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.790647: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 256 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.858333: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 256 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 256 layout: OutputInputYX shape: 1 1 }
  {zero_padding: 0 0  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.922493: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:27.970452: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 128 128  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:28.094454: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 256 256  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:28.258603: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 256 256  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 128 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
2022-06-13 20:36:28.330350: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:5025] Disabling cuDNN frontend for the following convolution:
  input: {count: 1 feature_map_count: 128 spatial: 256 256  value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX}
  filter: {output_feature_map_count: 3 input_feature_map_count: 128 layout: OutputInputYX shape: 3 3 }
  {zero_padding: 1 1  pad_alignment: default filter_strides: 1 1  dilation_rates: 1 1 }
  ... because it uses an identity activation.
--> DALL-E Server is up and running!
--> Model selected - DALL-E ModelSize.MINI
 * Serving Flask app 'app' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
INFO:werkzeug: * Running on all addresses (0.0.0.0)
   WARNING: This is a development server. Do not use it in a production deployment.
 * Running on http://127.0.0.1:8080
 * Running on http://172.18.178.198:8080 (Press CTRL+C to quit)
INFO:werkzeug: * Restarting with watchdog (inotify)
--> Starting DALL-E Server. This might take up to two minutes.
2022-06-13 20:36:35.217664: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x2a9e590 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-13 20:36:35.217708: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-13 20:36:35.233213: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-13 20:36:42.141732: W external/org_tensorflow/tensorflow/compiler/xla/service/platform_util.cc:200] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: INTERNAL: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 8589410304
2022-06-13 20:36:42.142927: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
^CTraceback (most recent call last):
  File "/home/technicjelle/build/dalle-playground/backend/app.py", line 50, in <module>
    dalle_model = DalleModel(dalle_version)
  File "/home/technicjelle/build/dalle-playground/backend/dalle_model.py", line 74, in __init__
    self.processor = DalleBartProcessor.from_pretrained(dalle_model, revision=DALLE_COMMIT_ID)
  File "/home/technicjelle/.local/lib/python3.8/site-packages/dalle_mini/model/utils.py", line 25, in from_pretrained
    return super(PretrainedFromWandbMixin, cls).from_pretrained(
  File "/usr/lib/python3.8/tempfile.py", line 966, in __exit__
    self.cleanup()
  File "/usr/lib/python3.8/tempfile.py", line 970, in cleanup
    self._rmtree(self.name)
  File "/usr/lib/python3.8/tempfile.py", line 952, in _rmtree
    _rmtree(name, onerror=onerror)
  File "/usr/lib/python3.8/shutil.py", line 718, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.8/shutil.py", line 673, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/technicjelle/build/dalle-playground/backend/app.py", line 50, in <module>
    dalle_model = DalleModel(dalle_version)
  File "/home/technicjelle/build/dalle-playground/backend/dalle_model.py", line 74, in __init__
    self.processor = DalleBartProcessor.from_pretrained(dalle_model, revision=DALLE_COMMIT_ID)
  File "/home/technicjelle/.local/lib/python3.8/site-packages/dalle_mini/model/utils.py", line 25, in from_pretrained
    return super(PretrainedFromWandbMixin, cls).from_pretrained(
  File "/usr/lib/python3.8/tempfile.py", line 966, in __exit__
    self.cleanup()
  File "/usr/lib/python3.8/tempfile.py", line 970, in cleanup
    self._rmtree(self.name)
  File "/usr/lib/python3.8/tempfile.py", line 952, in _rmtree
    _rmtree(name, onerror=onerror)
  File "/usr/lib/python3.8/shutil.py", line 718, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.8/shutil.py", line 673, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
KeyboardInterrupt

Jun 13 '22 18:06 TechnicJelle

Ah yes, odd. Looking further, on startup I get the OOM errors only and not the cuDNN or numa node ones, so I assume those are what you'll want to look at solving to prevent the restart.

Jun 13 '22 18:06 Ly-Zxzy

Alright, I see. Thanks. Let's see if we can get one step closer again...

Jun 13 '22 18:06 TechnicJelle

It seems like it might be due to "incompatible versions of CUDA, TensorFlow, NVIDIA drivers, etc."... I really don't want to go through this painful process of installing everything again... 🥲 I think I'll just wait until I've got real Linux on this machine, this is just too much of a headache and a timesink.

Thanks a lot for your help, though!! I really appreciate it!

Jun 13 '22 18:06 TechnicJelle

Yeah, my latest attempt failed. I'm going to try again later following all of @Ly-Zxzy instructions, in particular I didn't try 11.3 this time nor worry about the environment path.

@Ly-Zxzy can you share the output form lspci? It should include your video driver. The doc you linked in another section recommends using that to check your driver (section 2). I suspect that applies to people running Linux directly not via WSL. Per other docs at https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl it indicates a stub driver is installed that works with the native windows driver. Anyhow when I run it I get:

0482:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
7341:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
882a:00:00.0 3D controller: Microsoft Corporation Device 008e
cb34:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)

I'm guessing 882a:00:00.0 is the stub driver the later documents talk about.

Jun 13 '22 20:06 fschwiet

Hmm, might also be worth also upgrading jax then if you haven't yet. pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html

FYI. I only had a successful jax upgrade/install when I used this command. (I've also spent a long time trying to get cuda 11.3 and cudnn 8 properly installed). After this, jax finally found my GPU.

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

source: https://github.com/google/jax#installation

Jun 17 '22 16:06 rmartin16

For the people who have gotten it working on gpu, I was wondering how much CPU usage is being used. For some reason, my model (jax, pytorch all on gpu) is running on my gpu, but my cpu usage has 1 thread at 100%. Not sure if that is supposed to happen, but it seems to be the only thing that is bottlenecking the run. GPU usage is high, but not bottlenecked.

Jun 19 '22 16:06 raylin01

@raylin01, 3060 is pegged at 100% while a single thread on a gen4 i7 is also at 100%.

Jun 19 '22 17:06 rmartin16

dalle-playground dalle-playground copied to clipboard

CPU & GPU usage

dalle-playground
dalle-playground copied to clipboard