ru-dalle icon indicating copy to clipboard operation
ru-dalle copied to clipboard

Kernel dies with GTX 1050 4GB

Open loretoparisi opened this issue 2 years ago • 0 comments

My Jupyeter notebook kernel dies (The kernel appears to have died. It will restart automatically.)when trying to load the main model after downloading it:

device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device, cache_dir='./')

I have split cells for the vae, tokenizer and clip that all load fine. My nvidia-smi is the following:

Total GPU RAM: 3.94 Gb
CPU: 4
RAM GB: 7.8
PyTorch version: 1.10.1+cu102
CUDA version: 10.2
cuDNN version: 7605
Allowed GPU RAM: 3.5 Gb
GPU part 0.8886
Tue Jan  4 18:22:08 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0  On |                  N/A |
| 45%   25C    P0    N/A /  75W |    849MiB /  4033MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1104      G   /usr/lib/xorg/Xorg                 84MiB |
|    0   N/A  N/A      1682      G   /usr/bin/gnome-shell               31MiB |
|    0   N/A  N/A     12024      G   ...AAAAAAAA== --shared-files       38MiB |
|    0   N/A  N/A     13204      C   /usr/bin/python                   689MiB |
+-----------------------------------------------------------------------------+

while system mem is

loreto@ombromanto:~/Projects/notebooks/rudalle$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7,8G        3,8G        2,8G         97M        1,1G        3,6G
Swap:          2,0G        993M        1,0G

while cpu unit is


loreto@ombromanto:~/Projects/notebooks/rudalle$ cat /proc/cpuinfo  | grep 'name'| uniq
model name	: Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz

With this configuration I'm able to load models like CLIP, GLIDE, LAMA, etc with minor limitations.

I have also tried to follow this approach:

device = 'cpu'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=False, device=device, cache_dir='./')
if has_cuda:
     device = 'cuda'
     dalle.to(device)

loading the model in cpu and moving to cuda, but still getting the notebook issue:

[D 18:22:25.471 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:25.476 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[D 18:22:25.477 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:22:30.023 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:30.024 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[I 18:23:41.356 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05 restarted
[D 18:23:41.831 NotebookApp] Starting kernel: ['/usr/bin/python', '-m', 'ipykernel_launcher', '-f', '/home/loreto/.local/share/jupyter/runtime/kernel-464591fd-7e62-4cd7-80e8-0ac4f3f9ac05.json']
[D 18:23:42.303 NotebookApp] Connecting to: tcp://127.0.0.1:36147
[D 18:23:44.736 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (starting)
[D 18:23:44.759 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:23:44.761 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:23:45.040 NotebookApp] 200 GET /static/base/images/favicon-notebook.ico (127.0.0.1) 122.080000ms
[D 18:23:46.533 NotebookApp] 200 GET /api/contents/rudalle/Malevich_3_5GB_vRAM_usage.ipynb?content=0&_=1641316902647 (127.0.0.1) 19.390000ms
[D 18:23:54.294 NotebookApp] KernelRestarter: restart apparently succeeded

Of course in this case it would be necessary to convert to FP16 doing like dalle.convert_to_fp16() but I'm not sure how to do that.

loretoparisi avatar Jan 04 '22 16:01 loretoparisi