TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Any support for RWKV plz?

Open Pevernow opened this issue 2 years ago • 16 comments

RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.

So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).

Project Homepage: https://github.com/BlinkDL/RWKV-LM

Does TensorRT-LLM support such projects?

Pevernow avatar Oct 20 '23 13:10 Pevernow

Hi @Pevernow , thanks for your message. For the moment, RWKV is not on our roadmap. However, we welcome external contributions and if you are willing to contribute an implementation of RWKV, we could evaluate it and, eventually, merge it into TensorRT-LLM. Would you be interested in contributing?

jdemouth-nvidia avatar Oct 20 '23 14:10 jdemouth-nvidia

Maybe this is a little difficult for me. But I'll try to find another developer to do it.

Pevernow avatar Oct 21 '23 22:10 Pevernow

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

SanftMonster avatar Nov 04 '23 16:11 SanftMonster

Hi, I'd like to work on it. Should I open an issue for proposal before starting it?

Of course, it depends on your preference. Thank you for your contribution to the community.

Pevernow avatar Nov 05 '23 05:11 Pevernow

Hey, I need help in rwkv support in #384 . I would appreciate it if anyone can help me.

In the model forward, ind = arange(T-1, -1, self.dtype) is necessary, where T is a variable depending on the input shape. When building the model, T is deduced as -1. Therefore the building will fail. Any idea to deal with this case? @byshiue @jdemouth-nvidia

SanftMonster avatar Dec 05 '23 16:12 SanftMonster

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:

T = shape(q, -1)
xxx
ind = arange(T-1, -1, self.dtype)

QiJune avatar Dec 12 '23 01:12 QiJune

@AsakusaRinne For dynamic shape, you should use shape(x, -1), instead of x.shape[-1] to get a dim of a tensor.

Please try:


T = shape(q, -1)

xxx

ind = arange(T-1, -1, self.dtype)

I'll have a try. Thank you very much!

SanftMonster avatar Dec 12 '23 04:12 SanftMonster

@QiJune Seems that it does not work. I got an ind with shape (0), while the correct shape should be (T) because no matter what number is T, the range is T - 1 - (-1) = T. I'll appreciate it if you could help me with it. It really have bothered me for a long time.

SanftMonster avatar Dec 12 '23 17:12 SanftMonster

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

QiJune avatar Dec 13 '23 01:12 QiJune

@AsakusaRinne It seems that arange does not support -1, you need to set the end value explicitly

I also tried start=-1 and end=T-1 last night and had the same result. Does arrange just not support negative number as input?

SanftMonster avatar Dec 13 '23 01:12 SanftMonster

@AsakusaRinne Yes, the arange does not support negative number

QiJune avatar Dec 13 '23 06:12 QiJune

@QiJune I tried ind = arange(concat([0]), T, self.dtype) but it still seems to not work.

I saw the following error printed:

[TRT] [E] 4: [fillNode.cpp::lowerParams::75] Error Code 4: Internal Error ((Unnamed Layer* 233) [Fill]: LINSPACE requires that input 1 have rank 0)
[TRT] [E] 4: [graphShapeAnalyzer.cpp::needTypeAndDimensions::2235] Error Code 4: Internal Error (RwkvForCausalLM/layers/0/attention/FILL_0: output shape can not be computed)

If I print the shape of ind, I got (0).

Besides I noticed that if I use ws = pow(w, T), the result is just the same.

SanftMonster avatar Dec 13 '23 08:12 SanftMonster

How about ind = arange(0, T, self.dtype)

QiJune avatar Dec 14 '23 01:12 QiJune

How about ind = arange(0, T, self.dtype)

I'll get an assertion error:

  File "/home/rinne/TensorRT-LLM/tensorrt_llm/models/rwkv/model.py", line 104, in forward
    ind = arange(0, T, self.dtype)
  File "/home/rinne/TensorRT-LLM/tensorrt_llm/functional.py", line 1131, in arange
    assert isinstance(end, int)
AssertionError

SanftMonster avatar Dec 14 '23 03:12 SanftMonster

We have a test case for the arange function: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/functional/test_arange.py#L70

It should be ind = arange(np.array(0, dtype=np.int32), T, self.dtype)

QiJune avatar Dec 14 '23 06:12 QiJune

any update? when will RWKV ready in TRT-LLM?

wujinzhong avatar Feb 01 '24 08:02 wujinzhong