mesh-transformer-jax icon indicating copy to clipboard operation
mesh-transformer-jax copied to clipboard

About rope embedding

Open eyuansu62 opened this issue 8 months ago • 0 comments

why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.

eyuansu62 avatar Oct 26 '23 06:10 eyuansu62