Results 2 comments of Henry Liu

in Megatron-DeepSpeed/megatron/model/bert_model.py,there is a line: ```python extended_attention_mask = bert_extended_attention_mask(attention_mask) ``` which `bert_extended_attention_mask` is define like: ```python def bert_extended_attention_mask(attention_mask): # We create a 3D attention mask from a 2D tensor mask....

> in Megatron-DeepSpeed/megatron/model/bert_model.py,there is a line: > > ```python > extended_attention_mask = bert_extended_attention_mask(attention_mask) > ``` > > which `bert_extended_attention_mask` is define like: > > ```python > def bert_extended_attention_mask(attention_mask): > #...