StreamingTransformer
StreamingTransformer copied to clipboard
Question about chunk mask
In the chunk based streaming stretagy, encoder mask is caculated by method "adaptive_enc_mask". I try to reproduce the mask which is showed as the fig. As the figure shows, the encoder has a full history context and the future context is 32 * n_encoder_layer. Is that right?
The future context depends on your right window size. For example, if each frame can only access the context within the same chunk (right_window=0), the future context is always 32 for all the encoder layers. However, if it can access the context in the next chunk (right_window=1), the future context is 32 * 2 * n_encoder_layer.