accelerated-scan icon indicating copy to clipboard operation
accelerated-scan copied to clipboard

Is there any solution to support variable sequence length during training?

Open ruoyxue opened this issue 1 year ago • 2 comments

ruoyxue avatar Mar 11 '24 12:03 ruoyxue

Hi! The current workaround is to pad the output to the nearest power of two before scanning. Could you tell more about your use case?

proger avatar Mar 19 '24 12:03 proger

Not the original author here, but my specific use case are linear RNNs. These will have an initial hidden state $h_0$ which can be provided as the first token and setting gate[0] = 0, in case an implementation does not allow providing an initial state explicitly, such as for this one. However, this will shorten the length of the remaining actual sequence to process to $2^{n}-1$. This constraint may seem odd from an outsider's perspective who is not familiar with the details of the underlying parallel scan implementation.

I would therefore suggest two solutions for improving this situation:

  • Allow providing an initial element explicitly. From my understanding of the underlying CUDA implementation, this should be relatively easy to do.
  • Drop the power of two-sequence length constraint. Probably more difficult to implement and may come with a slight performance penalty but would also cover more use cases should as variable-length inference.

kklemon avatar Oct 10 '24 20:10 kklemon