RWKV-LM RWKV only show lower GPU memory occupancy when inference?

RWKV only show lower GPU memory occupancy when inference?

Open thucz opened this issue 1 year ago • 3 comments

I tried to use RWKV(e.g., Vision-RWKV) in CV tasks. But I found RWKV shows similar GPU memory occupancy to full-attention Transformer (like ViT) when training. I found both RWKV and Vision-RWKV only present their inference memory occupancy in the paper.

The high memory consume is not friendly for my tasks. Do you have any advice?

Jul 21 '24 10:07 thucz

Hi may I know your ctxlen

Jul 21 '24 14:07 BlinkDL

ctx_len is 8192

Jul 21 '24 15:07 thucz

Please check whether attention/rwkv is your bottleneck

Sep 08 '24 19:09 BlinkDL

RWKV-LM RWKV-LM copied to clipboard

RWKV only show lower GPU memory occupancy when inference?

RWKV-LM
RWKV-LM copied to clipboard