RWKV-LM
RWKV-LM copied to clipboard
VRAM performance
Hi @BlinkDL! First off this is amazing and seems very promising for scaling down large Transformers to be more production friendly.
I'm wondering if you have any benchmarks regarding VRAM performance? Specifically I've got 3 questions:
1 - How much VRAM does this model (or rather, the CUDA version) need for training? Are we talking 1060 size (6gb), 3090 size (20gb), or a6000+ size (40+gb) 2 - Same question as 1, but for inference? 3 - Can this run on CPU reasonably?
- Similar to usual GPT of the same size, because we are using parallelization to increase training speed. However, you can definitely train it like a RNN to save VRAM (but that will be much slower).
- More friendly than usual GPT. Because you don't need to keep a huge context (or kv cache). You just need the hidden state of the last single token.
- YES! Inference is very fast even on CPU. Please try run.py