annotated_deep_learning_paper_implementations icon indicating copy to clipboard operation
annotated_deep_learning_paper_implementations copied to clipboard

question about RotaryPEMultiHeadAttention: rotary_percentage

Open YOONSEOKHEO opened this issue 1 year ago • 1 comments

I confirmed that there is code in the RotaryPEMultiHeadAttention class that reduces the dimension using a parameter called rope_percentage. (URL: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/285cb3735bde02fbc8c19ddeb24d0ae7e77135c1/labml_nn/transformers/rope/init.py#L205)

I am curious in what cases you would set rope_percentage to a value less than 1.

(Of course, in experiment.py, we confirmed that rope_percentage is set to 1.0.)

YOONSEOKHEO avatar Mar 13 '24 19:03 YOONSEOKHEO

I'm also not sure. I usually set it to 1. I have seen implementations where it's set to 0.5. I guess they do it so that some dimensions never get rotated and it makes it easier for the model to use attention only using content with no interference from the positional information.

vpj avatar Jun 24 '24 10:06 vpj