Results 3 comments of jiaxin

@xxa783 Could you tell me the amount of your GPUs for training? I guess this is an ignored factor for reproducing the result.

I think it's a typo too. the layer norm2 norm3 are not utilized in the forword