gpt-j-6b-gpu-docker
gpt-j-6b-gpu-docker copied to clipboard
Model tokenizer created 5 secs Killed
Dear Devforth, I am currently getting the follow error:
$ docker run -p 8081:8080 --gpus all --rm -it devforth/gpt-j-6b-gpu
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 2.09MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 1.05MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 3.12MB/s]
⌚ Model tokenizer created 5 secs
Killed
These are the details of my GPU
$ nvidia-smi
Fri Jan 27 09:23:26 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:13:00.0 Off | Off |
| N/A 16C P8 14W / 150W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
This is the details about the card
13:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
OS:
Ubuntu 22.04 LTS
13:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
Getting the same issue. It looks like it's running out of RAM? The process is getting killed by something. Maybe some OS config issue. Interestingly it also does the same thing if you run it without any GPUs, so maybe it's not finding the GPU and trying to use the CPU but running out of RAM? Not sure how to fix this.
We are both using different CUDA versions. Both newer than the one in the readme, could that be it? CUDA 11.7
Could try downgrading to 11.6 and see
Same issue, Killed means you run out of free RAM, add more RAM or use swapfile on fast storage. Adding solved Killed issue, but I got trapped to pytorch compatibility with A4000.