Baichuan-7B
Baichuan-7B copied to clipboard
[Question] RoPE的实现和论文里不一致
Required prerequisites
- [X] I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
- [X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- [X] Consider asking first in a Discussion.
Questions
请问这里 的实现为啥和论文里面不一样?
def rotate_half(x):
"""Rotates half the hidden dims of the input."""
x1 = x[..., : x.shape[-1] // 2]
x2 = x[..., x.shape[-1] // 2:]
return torch.cat((-x2, x1), dim=-1)
论文里的计算是
按照这种实现最后的计算结果会是
我看huggingface里面也是这样,好奇为啥选择这种实现?
Checklist
- [X] I have provided all relevant and necessary information above.
- [X] I have chosen a suitable title for this issue.
embedding 里面神经元的位置是没有顺序的,随便选一半做反转就行了;