RWKV-LM icon indicating copy to clipboard operation
RWKV-LM copied to clipboard

VRAM performance

Open cuuupid opened this issue 2 years ago • 1 comments

Hi @BlinkDL! First off this is amazing and seems very promising for scaling down large Transformers to be more production friendly.

I'm wondering if you have any benchmarks regarding VRAM performance? Specifically I've got 3 questions:

1 - How much VRAM does this model (or rather, the CUDA version) need for training? Are we talking 1060 size (6gb), 3090 size (20gb), or a6000+ size (40+gb) 2 - Same question as 1, but for inference? 3 - Can this run on CPU reasonably?

cuuupid avatar Apr 08 '22 20:04 cuuupid

  1. Similar to usual GPT of the same size, because we are using parallelization to increase training speed. However, you can definitely train it like a RNN to save VRAM (but that will be much slower).
  2. More friendly than usual GPT. Because you don't need to keep a huge context (or kv cache). You just need the hidden state of the last single token.
  3. YES! Inference is very fast even on CPU. Please try run.py

BlinkDL avatar Apr 15 '22 14:04 BlinkDL