neural_sp icon indicating copy to clipboard operation
neural_sp copied to clipboard

Question on masking in transformer encoder

Open t13m opened this issue 4 years ago • 5 comments

Hello Mr. Hirofumi, thanks for opening source this excellent repository. Here I have some questions about the masking mechanism in transformer encoder:

In the code here https://github.com/hirofumi0810/neural_sp/blob/3be0bac8a1b009ee36f10ca901f4c64160a5ce45/neural_sp/models/seq2seq/encoders/transformer.py#L394 there is a note says: # NOTE: no mask to avoid masking all frames in a chunk. If you would be so kind, could you explain a little bit the reason why avoiding masking all frames in a chunk? Isn't it ok if all frames in a chunk are masked? In which case, I think both the masks and encoder output could be correct after reshape from chunkwise shapes to their normal shapes.

Another question is how do you deal with overlapped chunks when streaming_type=='mask'?

t13m avatar Sep 30 '20 02:09 t13m

All frames in the tail chunk in some utterances might be completely masked out. That's why I did set xx_mask to None. But I'm trying to implement the strict masking in the tail chunk now.

Regarding 'mask' option, the context size will be accumulated according to the depth. 'reshape' option avoids this by trading memory consumption.

hirofumi0810 avatar Oct 01 '20 09:10 hirofumi0810

Thank you. Will it make any differences on parameter updating or final performance result if all frames of some padded tail chunks are masked out?

t13m avatar Oct 12 '20 08:10 t13m

@t13m I checked it and found that the explicit tail masking did not change the performance.

hirofumi0810 avatar Oct 29 '20 05:10 hirofumi0810

Hi @hirofumi0810 I have found that transformer_enc_pe_type 'add' is used with a lc_type 'reshape' in the mma example 'lc_transformer_mma_hie_subsample8_ma4H_ca4H_w16_from4L_64_64_32.yaml'. In this config, different chunks should have same position encoding. Don't you think it will degrade the performance?

SoonSYJ avatar Jan 04 '21 08:01 SoonSYJ

@SoonSYJ I remember I tried different indices in each chunk, but it was not helpful. So I simply reuse the same indices in each chunk. Note that such positional encoding is still helpful on AISHELL-1.

hirofumi0810 avatar Jan 11 '21 16:01 hirofumi0810