mesh-transformer-jax icon indicating copy to clipboard operation
mesh-transformer-jax copied to clipboard

CausalTransformerV2 or CausalTransformer?

Open leejason opened this issue 3 years ago • 0 comments

Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why?

Thanks for any advice.

leejason avatar Apr 16 '22 01:04 leejason