TensorRT-LLM Any support for RWKV plz?

RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

Project Homepage: https://github.com/BlinkDL/RWKV-LM

Does TensorRT-LLM support such projects?

Oct 20 '23 13:10 Pevernow

Hi @Pevernow , thanks for your message. For the moment, RWKV is not on our roadmap. However, we welcome external contributions and if you are willing to contribute an implementation of RWKV, we could evaluate it and, eventually, merge it into TensorRT-LLM. Would you be interested in contributing?

Oct 20 '23 14:10 jdemouth-nvidia

Maybe this is a little difficult for me. But I'll try to find another developer to do it.

Oct 21 '23 22:10 Pevernow

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

Nov 04 '23 16:11 SanftMonster

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

Of course, it depends on your preference. Thank you for your contribution to the community.

Nov 05 '23 05:11 Pevernow

Hey, I need help in rwkv support in #384 . I would appreciate it if anyone can help me.

In the model forward, ind = arange(T-1, -1, self.dtype) is necessary, where T is a variable depending on the input shape. When building the model, T is deduced as -1. Therefore the building will fail. Any idea to deal with this case? @byshiue @jdemouth-nvidia

Dec 05 '23 16:12 SanftMonster

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:

T = shape(q, -1)
xxx
ind = arange(T-1, -1, self.dtype)

Dec 12 '23 01:12 QiJune

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:
T = shape(q, -1)

xxx

ind = arange(T-1, -1, self.dtype)

I'll have a try. Thank you very much!

Dec 12 '23 04:12 SanftMonster

@QiJune Seems that it does not work. I got an ind with shape (0), while the correct shape should be (T) because no matter what number is T, the range is T - 1 - (-1) = T. I'll appreciate it if you could help me with it. It really have bothered me for a long time.

Dec 12 '23 17:12 SanftMonster

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

Dec 13 '23 01:12 QiJune

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

I also tried start=-1 and end=T-1 last night and had the same result. Does arrange just not support negative number as input?

Dec 13 '23 01:12 SanftMonster

@AsakusaRinne Yes, the arange does not support negative number

Dec 13 '23 06:12 QiJune

@QiJune I tried ind = arange(concat([0]), T, self.dtype) but it still seems to not work.

I saw the following error printed:

[TRT] [E] 4: [fillNode.cpp::lowerParams::75] Error Code 4: Internal Error ((Unnamed Layer* 233) [Fill]: LINSPACE requires that input 1 have rank 0)
[TRT] [E] 4: [graphShapeAnalyzer.cpp::needTypeAndDimensions::2235] Error Code 4: Internal Error (RwkvForCausalLM/layers/0/attention/FILL_0: output shape can not be computed)

If I print the shape of ind, I got (0).

Besides I noticed that if I use ws = pow(w, T), the result is just the same.

Dec 13 '23 08:12 SanftMonster

How about ind = arange(0, T, self.dtype)

Dec 14 '23 01:12 QiJune

How about ind = arange(0, T, self.dtype)

I'll get an assertion error:

  File "/home/rinne/TensorRT-LLM/tensorrt_llm/models/rwkv/model.py", line 104, in forward
    ind = arange(0, T, self.dtype)
  File "/home/rinne/TensorRT-LLM/tensorrt_llm/functional.py", line 1131, in arange
    assert isinstance(end, int)
AssertionError

Dec 14 '23 03:12 SanftMonster

We have a test case for the arange function: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/functional/test_arange.py#L70

It should be ind = arange(np.array(0, dtype=np.int32), T, self.dtype)

Dec 14 '23 06:12 QiJune

any update? when will RWKV ready in TRT-LLM?

Feb 01 '24 08:02 wujinzhong