RWKV
Model description
RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V) RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
The model is also implemented in the trasnformers library The RNN Mode for inference pushing the context len could be pretty interesting and implementation in this server would also push the model adoption. There are cpp projects for this model running in the browser and its still fast, so with optimized version here it would be insane
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
No response
This would be interesting, however is pretty far away from how this repo operates (the transformer assumption is pretty strong).
But since there's no past key values, no attention, it should make the whole thing even easier to write. Would you be willing to write a PR for it ? Any existing models on hf.co ?
https://huggingface.co/spaces/BlinkDL/RWKV-World-7B Demo
https://huggingface.co/models?other=rwkv Models
I will check other PRs and try to make an integration
https://huggingface.co/spaces/BlinkDL/RWKV-World-7B Demo
https://huggingface.co/models?other=rwkv Models
I will check other PRs and try to make an integration
It's already in the code base btw