text-generation-inference RWKV

Model description

RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V) RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

The model is also implemented in the trasnformers library The RNN Mode for inference pushing the context len could be pretty interesting and implementation in this server would also push the model adoption. There are cpp projects for this model running in the browser and its still fast, so with optimized version here it would be insane

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

No response

Jul 03 '23 07:07 flozi00

This would be interesting, however is pretty far away from how this repo operates (the transformer assumption is pretty strong).

But since there's no past key values, no attention, it should make the whole thing even easier to write. Would you be willing to write a PR for it ? Any existing models on hf.co ?

Jul 03 '23 09:07 Narsil

https://huggingface.co/spaces/BlinkDL/RWKV-World-7B Demo

https://huggingface.co/models?other=rwkv Models

I will check other PRs and try to make an integration

Jul 03 '23 09:07 flozi00

https://huggingface.co/spaces/BlinkDL/RWKV-World-7B Demo

https://huggingface.co/models?other=rwkv Models

I will check other PRs and try to make an integration

It's already in the code base btw

Jul 04 '23 07:07 ArEnSc