dancingpipi

Results 21 comments of dancingpipi

the config is: ``` ds_config = DeepSpeedTransformerConfig( batch_size=1, hidden_size=768, intermediate_size=768*4, heads=12, hidden_dropout_ratio=0, attn_dropout_ratio=0, num_hidden_layers=12, initializer_range=0.02, local_rank=0, layer_norm_eps=1e-12, fp16=fp16, training=True, pre_layer_norm=False, seq_length=128 ) ``` Since padding will lead to different results,...

> Hi @dancingpipi > > I will look into this and will send a fix soon. > > Best, Reza Thank you very much for your attention to this issue,...

@RezaYazdaniAminabadi Take the liberty to ask, is there any progress on this issue?

@RezaYazdaniAminabadi Thanks a lot! I found that the cause of the diff was that I didn't set the mask to fp16. Sorry for wasting your time

> Try to add `net.eval()` to switch to eval mode oh my god, I forget it ! I will try it!

I met this error too, if you solve it , please help me ~

> Hi z13974509906, > > Thanks for reaching out! I would like to try and reproduce this issue. > > Do you mind sharing the model that you were attempting...