mesh-transformer-jax
mesh-transformer-jax copied to clipboard
About rope embedding
why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.
why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.