DeepSeek-V2 icon indicating copy to clipboard operation
DeepSeek-V2 copied to clipboard

Exploring the Combined Effects of YaRN and Adjusted rope_base Values in deepseek v2

Open hannlp opened this issue 5 months ago • 0 comments

In deepseek v2, static YaRN with rope_base=10000 was used, yielding excellent extrapolation results. Could the authors clarify whether they have attempted to set rope_base to 500000 while using YaRN, and if so, whether this combination produces a synergistic effect, surpassing both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @luofuli

hannlp avatar Sep 04 '24 07:09 hannlp