text-generation-webui
text-generation-webui copied to clipboard
Error code 137 when loading RWKV-4-RAVEN model
Describe the bug
Attempting to load RWKV-4-RAVEN model RWKV-4-Raven-14B-v10-Eng99%-Other1%-20230427-ctx8192.pth (I've tried a couple of the newer large RWKV MODELS with same effect)
with GPU (24gb) I get out of memory errors from cuda)
with CPU (64GB ram) container exits with error code 137
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Download Large RWKV-4-RAVEN model from Hugging face (RWKV-4-Raven-14B-v10-Eng99%-Other1%-20230427-ctx8192.pth)
Run text-generation-webui via docker.
Load model in container, Watch container die.
Screenshot
No response
Logs
> text-generation-webui-text-generation-webui-1 | Loading models/RWKV-4-Raven-14B-v10-Eng99%-Other1%-20230427-ctx8192.pth ...
> text-generation-webui-text-generation-webui-1 | Strategy: (total 40+1=41 layers)
> text-generation-webui-text-generation-webui-1 | * cpu [float32, float32], store 41 layers
> text-generation-webui-text-generation-webui-1 | 0-cpu-float32-float32 1-cpu-float32-float32 2-cpu-float32-float32 3-cpu-float32-float32 4-cpu-float32-float32 5-cpu-float32-float32 6-cpu-float32-float32 7-cpu-float32-float32 8-cpu-float32-float32 9-cpu-float32-float32 10-cpu-float32-float32 11-cpu-float32-float32 12-cpu-float32-float32 13-cpu-float32-float32 14-cpu-float32-float32 15-cpu-float32-float32 16-cpu-float32-float32 17-cpu-float32-float32 18-cpu-float32-float32 19-cpu-float32-float32 20-cpu-float32-float32 21-cpu-float32-float32 22-cpu-float32-float32 23-cpu-float32-float32 24-cpu-float32-float32 25-cpu-float32-float32 26-cpu-float32-float32 27-cpu-float32-float32 28-cpu-float32-float32 29-cpu-float32-float32 30-cpu-float32-float32 31-cpu-float32-float32 32-cpu-float32-float32 33-cpu-float32-float32 34-cpu-float32-float32 35-cpu-float32-float32 36-cpu-float32-float32 37-cpu-float32-float32 38-cpu-float32-float32 39-cpu-float32-float32 40-cpu-float32-float32
> text-generation-webui-text-generation-webui-1 | emb.weight f32 cpu 50277 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ln1.weight f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ln1.bias f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ln2.weight f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ln2.bias f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.time_decay f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.time_first f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.time_mix_k f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.time_mix_v f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.time_mix_r f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.key.weight f32 cpu 5120 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.value.weight f32 cpu 5120 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.receptance.weight f32 cpu 5120 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.att.output.weight f32 cpu 5120 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ffn.time_mix_k f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ffn.time_mix_r f32 cpu 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ffn.key.weight f32 cpu 5120 20480
> text-generation-webui-text-generation-webui-1 | blocks.0.ffn.receptance.weight f32 cpu 5120 5120
> text-generation-webui-text-generation-webui-1 | blocks.0.ffn.value.weight f32 cpu 20480 5120
> text-generation-webui-text-generation-webui-1 | ...............................................................................................................................................................................................................................................................................................................................................................................................................................................Killed
> text-generation-webui-text-generation-webui-1 exited with code 137
>
System Info
OS: Arch with Docker version 23.0.1, build a5ee5b1dfc
GPU: Nvidia Tesla M40 24gb
CPU RAM: 64gb
I'm just trying RWKV and it's working well for me. Not running in a container though. I'm using an Ubuntu 22.04 KVM VM with 64 GB RAM and passing through the host's 2 x RTX3060 12 GB GPUs.
This smaller model RWKV-4-Raven-3B-v8-Eng-20230408-ctx4096.pth is the last one I tried (working). The speed is impressive, > 10 tokens/sec. Next to try some larger models.
The similar lines from my log - where yours is blowing up are:
blocks.0.ffn.receptance.weight f16 cuda:0 2560 2560
blocks.0.ffn.value.weight f16 cuda:0 10240 2560
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
blocks.31.ln1.weight f16 cuda:0 2560
Actually, this appears to be my bad on the CPU/RAM load, I tried a different server and was able to load the 13B model at 66GB of ram, so it would appear my first server was just underpowered for the 13B version.
Still not sure why it quietly quits on the GPU load
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.