Orion
Orion copied to clipboard
What is the technique used to extend the context size to 200,000 tokens?
+1,请问预训练/sft阶段用到的最大上下文长度是多少,外推方式是?
No description provided.
Thanks for your attention. We used a longer context for pre-training as well as some existing extrapolation methods.