RWKV-LM
RWKV-LM copied to clipboard
RWKV only show lower GPU memory occupancy when inference?
I tried to use RWKV(e.g., Vision-RWKV) in CV tasks. But I found RWKV shows similar GPU memory occupancy to full-attention Transformer (like ViT) when training. I found both RWKV and Vision-RWKV only present their inference memory occupancy in the paper.
The high memory consume is not friendly for my tasks. Do you have any advice?
Hi may I know your ctxlen
ctx_len is 8192
Please check whether attention/rwkv is your bottleneck