jiaxin
Results
3
comments of
jiaxin
> Yes Thank you
@xxa783 Could you tell me the amount of your GPUs for training? I guess this is an ignored factor for reproducing the result.
I think it's a typo too. the layer norm2 norm3 are not utilized in the forword