accelerated-scan
accelerated-scan copied to clipboard
Is there any solution to support variable sequence length during training?
Hi! The current workaround is to pad the output to the nearest power of two before scanning. Could you tell more about your use case?
Not the original author here, but my specific use case are linear RNNs. These will have an initial hidden state $h_0$ which can be provided as the first token and setting gate[0] = 0, in case an implementation does not allow providing an initial state explicitly, such as for this one. However, this will shorten the length of the remaining actual sequence to process to $2^{n}-1$. This constraint may seem odd from an outsider's perspective who is not familiar with the details of the underlying parallel scan implementation.
I would therefore suggest two solutions for improving this situation:
- Allow providing an initial element explicitly. From my understanding of the underlying CUDA implementation, this should be relatively easy to do.
- Drop the power of two-sequence length constraint. Probably more difficult to implement and may come with a slight performance penalty but would also cover more use cases should as variable-length inference.