text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Error code 137 when loading RWKV-4-RAVEN model

Open DIGist opened this issue 1 year ago • 2 comments

Describe the bug

Attempting to load RWKV-4-RAVEN model RWKV-4-Raven-14B-v10-Eng99%-Other1%-20230427-ctx8192.pth (I've tried a couple of the newer large RWKV MODELS with same effect)

with GPU (24gb) I get out of memory errors from cuda)

with CPU (64GB ram) container exits with error code 137

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

Download Large RWKV-4-RAVEN model from Hugging face (RWKV-4-Raven-14B-v10-Eng99%-Other1%-20230427-ctx8192.pth)

Run text-generation-webui via docker.

Load model in container, Watch container die.

Screenshot

No response

Logs

> text-generation-webui-text-generation-webui-1  | Loading models/RWKV-4-Raven-14B-v10-Eng99%-Other1%-20230427-ctx8192.pth ...
> text-generation-webui-text-generation-webui-1  | Strategy: (total 40+1=41 layers)
> text-generation-webui-text-generation-webui-1  | * cpu [float32, float32], store 41 layers
> text-generation-webui-text-generation-webui-1  | 0-cpu-float32-float32 1-cpu-float32-float32 2-cpu-float32-float32 3-cpu-float32-float32 4-cpu-float32-float32 5-cpu-float32-float32 6-cpu-float32-float32 7-cpu-float32-float32 8-cpu-float32-float32 9-cpu-float32-float32 10-cpu-float32-float32 11-cpu-float32-float32 12-cpu-float32-float32 13-cpu-float32-float32 14-cpu-float32-float32 15-cpu-float32-float32 16-cpu-float32-float32 17-cpu-float32-float32 18-cpu-float32-float32 19-cpu-float32-float32 20-cpu-float32-float32 21-cpu-float32-float32 22-cpu-float32-float32 23-cpu-float32-float32 24-cpu-float32-float32 25-cpu-float32-float32 26-cpu-float32-float32 27-cpu-float32-float32 28-cpu-float32-float32 29-cpu-float32-float32 30-cpu-float32-float32 31-cpu-float32-float32 32-cpu-float32-float32 33-cpu-float32-float32 34-cpu-float32-float32 35-cpu-float32-float32 36-cpu-float32-float32 37-cpu-float32-float32 38-cpu-float32-float32 39-cpu-float32-float32 40-cpu-float32-float32 
> text-generation-webui-text-generation-webui-1  | emb.weight                        f32      cpu  50277  5120 
> text-generation-webui-text-generation-webui-1  | blocks.0.ln1.weight               f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.ln1.bias                 f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.ln2.weight               f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.ln2.bias                 f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.att.time_decay           f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.att.time_first           f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.att.time_mix_k           f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.att.time_mix_v           f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.att.time_mix_r           f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.att.key.weight           f32      cpu   5120  5120 
> text-generation-webui-text-generation-webui-1  | blocks.0.att.value.weight         f32      cpu   5120  5120 
> text-generation-webui-text-generation-webui-1  | blocks.0.att.receptance.weight    f32      cpu   5120  5120 
> text-generation-webui-text-generation-webui-1  | blocks.0.att.output.weight        f32      cpu   5120  5120 
> text-generation-webui-text-generation-webui-1  | blocks.0.ffn.time_mix_k           f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.ffn.time_mix_r           f32      cpu   5120       
> text-generation-webui-text-generation-webui-1  | blocks.0.ffn.key.weight           f32      cpu   5120 20480 
> text-generation-webui-text-generation-webui-1  | blocks.0.ffn.receptance.weight    f32      cpu   5120  5120 
> text-generation-webui-text-generation-webui-1  | blocks.0.ffn.value.weight         f32      cpu  20480  5120 
> text-generation-webui-text-generation-webui-1  | ...............................................................................................................................................................................................................................................................................................................................................................................................................................................Killed
> text-generation-webui-text-generation-webui-1 exited with code 137
>

System Info

OS: Arch with Docker version 23.0.1, build a5ee5b1dfc
GPU: Nvidia Tesla M40 24gb
CPU RAM: 64gb

DIGist avatar May 05 '23 00:05 DIGist

I'm just trying RWKV and it's working well for me. Not running in a container though. I'm using an Ubuntu 22.04 KVM VM with 64 GB RAM and passing through the host's 2 x RTX3060 12 GB GPUs.

This smaller model RWKV-4-Raven-3B-v8-Eng-20230408-ctx4096.pth is the last one I tried (working). The speed is impressive, > 10 tokens/sec. Next to try some larger models.

The similar lines from my log - where yours is blowing up are:

blocks.0.ffn.receptance.weight f16 cuda:0 2560 2560 blocks.0.ffn.value.weight f16 cuda:0 10240 2560 ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ blocks.31.ln1.weight f16 cuda:0 2560

dblacknc avatar May 05 '23 15:05 dblacknc

Actually, this appears to be my bad on the CPU/RAM load, I tried a different server and was able to load the 13B model at 66GB of ram, so it would appear my first server was just underpowered for the 13B version.

Still not sure why it quietly quits on the GPU load

DIGist avatar May 06 '23 01:05 DIGist

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Aug 23 '23 23:08 github-actions[bot]