neural_sp Question on masking in transformer encoder

Hello Mr. Hirofumi, thanks for opening source this excellent repository. Here I have some questions about the masking mechanism in transformer encoder:

In the code here https://github.com/hirofumi0810/neural_sp/blob/3be0bac8a1b009ee36f10ca901f4c64160a5ce45/neural_sp/models/seq2seq/encoders/transformer.py#L394 there is a note says: # NOTE: no mask to avoid masking all frames in a chunk. If you would be so kind, could you explain a little bit the reason why avoiding masking all frames in a chunk? Isn't it ok if all frames in a chunk are masked? In which case, I think both the masks and encoder output could be correct after reshape from chunkwise shapes to their normal shapes.

Another question is how do you deal with overlapped chunks when streaming_type=='mask'?

Sep 30 '20 02:09 t13m

All frames in the tail chunk in some utterances might be completely masked out. That's why I did set xx_mask to None. But I'm trying to implement the strict masking in the tail chunk now.

Regarding 'mask' option, the context size will be accumulated according to the depth. 'reshape' option avoids this by trading memory consumption.

Oct 01 '20 09:10 hirofumi0810

Thank you. Will it make any differences on parameter updating or final performance result if all frames of some padded tail chunks are masked out?

Oct 12 '20 08:10 t13m

@t13m I checked it and found that the explicit tail masking did not change the performance.

Oct 29 '20 05:10 hirofumi0810

Hi @hirofumi0810 I have found that transformer_enc_pe_type 'add' is used with a lc_type 'reshape' in the mma example 'lc_transformer_mma_hie_subsample8_ma4H_ca4H_w16_from4L_64_64_32.yaml'. In this config, different chunks should have same position encoding. Don't you think it will degrade the performance?

Jan 04 '21 08:01 SoonSYJ

@SoonSYJ I remember I tried different indices in each chunk, but it was not helpful. So I simply reuse the same indices in each chunk. Note that such positional encoding is still helpful on AISHELL-1.

Jan 11 '21 16:01 hirofumi0810

neural_sp neural_sp copied to clipboard

Question on masking in transformer encoder

neural_sp
neural_sp copied to clipboard