Orion What is the technique used to extend the context size to 200,000 tokens?

What is the technique used to extend the context size to 200,000 tokens?

Open aburkov opened this issue 1 year ago • 2 comments

Jan 21 '24 18:01 aburkov

+1，请问预训练/sft阶段用到的最大上下文长度是多少，外推方式是？

Jan 23 '24 02:01 shihanmax

No description provided.

Thanks for your attention. We used a longer context for pre-training as well as some existing extrapolation methods.

Jan 24 '24 06:01 chenxingphh