DeepSeek-V2
DeepSeek-V2 copied to clipboard
Exploring the Combined Effects of YaRN and Adjusted rope_base Values in deepseek v2
In deepseek v2, static YaRN with rope_base=10000 was used, yielding excellent extrapolation results. Could the authors clarify whether they have attempted to set rope_base to 500000 while using YaRN, and if so, whether this combination produces a synergistic effect, surpassing both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @luofuli