ru-dalle
ru-dalle copied to clipboard
Kernel dies with GTX 1050 4GB
My Jupyeter notebook kernel dies (The kernel appears to have died. It will restart automatically
.)when trying to load the main model after downloading it:
device = 'cuda'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=True, device=device, cache_dir='./')
I have split cells for the vae
, tokenizer
and clip
that all load fine. My nvidia-smi
is the following:
Total GPU RAM: 3.94 Gb
CPU: 4
RAM GB: 7.8
PyTorch version: 1.10.1+cu102
CUDA version: 10.2
cuDNN version: 7605
Allowed GPU RAM: 3.5 Gb
GPU part 0.8886
Tue Jan 4 18:22:08 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A |
| 45% 25C P0 N/A / 75W | 849MiB / 4033MiB | 6% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1104 G /usr/lib/xorg/Xorg 84MiB |
| 0 N/A N/A 1682 G /usr/bin/gnome-shell 31MiB |
| 0 N/A N/A 12024 G ...AAAAAAAA== --shared-files 38MiB |
| 0 N/A N/A 13204 C /usr/bin/python 689MiB |
+-----------------------------------------------------------------------------+
while system mem is
loreto@ombromanto:~/Projects/notebooks/rudalle$ free -h
total used free shared buff/cache available
Mem: 7,8G 3,8G 2,8G 97M 1,1G 3,6G
Swap: 2,0G 993M 1,0G
while cpu unit is
loreto@ombromanto:~/Projects/notebooks/rudalle$ cat /proc/cpuinfo | grep 'name'| uniq
model name : Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
With this configuration I'm able to load models like CLIP, GLIDE, LAMA, etc with minor limitations.
I have also tried to follow this approach:
device = 'cpu'
dalle = get_rudalle_model('Malevich', pretrained=True, fp16=False, device=device, cache_dir='./')
if has_cuda:
device = 'cuda'
dalle.to(device)
loading the model in cpu and moving to cuda
, but still getting the notebook issue:
[D 18:22:25.471 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:25.476 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[D 18:22:25.477 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:22:30.023 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:22:30.024 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: execute_input
[I 18:23:41.356 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05 restarted
[D 18:23:41.831 NotebookApp] Starting kernel: ['/usr/bin/python', '-m', 'ipykernel_launcher', '-f', '/home/loreto/.local/share/jupyter/runtime/kernel-464591fd-7e62-4cd7-80e8-0ac4f3f9ac05.json']
[D 18:23:42.303 NotebookApp] Connecting to: tcp://127.0.0.1:36147
[D 18:23:44.736 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (starting)
[D 18:23:44.759 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (busy)
[D 18:23:44.761 NotebookApp] activity on 464591fd-7e62-4cd7-80e8-0ac4f3f9ac05: status (idle)
[D 18:23:45.040 NotebookApp] 200 GET /static/base/images/favicon-notebook.ico (127.0.0.1) 122.080000ms
[D 18:23:46.533 NotebookApp] 200 GET /api/contents/rudalle/Malevich_3_5GB_vRAM_usage.ipynb?content=0&_=1641316902647 (127.0.0.1) 19.390000ms
[D 18:23:54.294 NotebookApp] KernelRestarter: restart apparently succeeded
Of course in this case it would be necessary to convert to FP16 doing like dalle.convert_to_fp16()
but I'm not sure how to do that.