accelerated-scan Is there any solution to support variable sequence length during training?

Is there any solution to support variable sequence length during training?

Open ruoyxue opened this issue 1 year ago • 2 comments

Mar 11 '24 12:03 ruoyxue

Hi! The current workaround is to pad the output to the nearest power of two before scanning. Could you tell more about your use case?

Mar 19 '24 12:03 proger

Not the original author here, but my specific use case are linear RNNs. These will have an initial hidden state $h_0$ which can be provided as the first token and setting gate[0] = 0, in case an implementation does not allow providing an initial state explicitly, such as for this one. However, this will shorten the length of the remaining actual sequence to process to $2^{n}-1$. This constraint may seem odd from an outsider's perspective who is not familiar with the details of the underlying parallel scan implementation.

I would therefore suggest two solutions for improving this situation:

Allow providing an initial element explicitly. From my understanding of the underlying CUDA implementation, this should be relatively easy to do.
Drop the power of two-sequence length constraint. Probably more difficult to implement and may come with a slight performance penalty but would also cover more use cases should as variable-length inference.

Oct 10 '24 20:10 kklemon

accelerated-scan accelerated-scan copied to clipboard

Is there any solution to support variable sequence length during training?

accelerated-scan
accelerated-scan copied to clipboard