mesh-transformer-jax
mesh-transformer-jax copied to clipboard
CausalTransformerV2 or CausalTransformer?
Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why?
Thanks for any advice.