Yi icon indicating copy to clipboard operation
Yi copied to clipboard

A series of large language models trained from scratch by developers @01-ai

Results 76 Yi issues
Sort by recently updated
recently updated
newest added

Hello! Is there any information about how to finetune the 6B-200K context window model?

question
sft
doc-not-needed

因为Yi-6B-200K 和Yi-34B-200K两个模型看上去都拥有了200K长度的context能力,这相当令人振奋(GPT4-turbo 仅仅只有128K的能力)。 这一点吸引了海内外很多业内人士的关注,我对此很好奇,想知道我应该如果验证这一点,目前官方好像没有提供相关实验数据或者guide,有没有人能给予一点帮助,谢谢。 ----------------------------------------------------------------------- Because both models Yi-6B-200K and Yi-34B-200K seem to have 200K length context capabilities, this is quite exciting (GPT4-turbo only has 128K capabilities). This has attracted...

doc
doc-not-needed

Hi I've tried the 34b-chat model on replicate and found that model safety guardrails can be bypassed quite easily with minimum adversarial prompting. The same prompt will fail on any...

regression
sft
doc-not-needed

Could you please run this context test for the `Yi-6B-Chat` model? Here is the code: [link](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) Below are the result for the `Qwen-72B-Chat` model: ![image](https://github.com/QwenLM/Qwen/raw/main/assets/qwen_72b_needle_in_a_haystack.png) This request is not a...

enhancement
help wanted

Could you run this long context test for `Yi-34B-200k`? code: [link](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) The results for `GPT-4-128k` and `Claude 2.1` are as follows: ![image](https://github.com/gkamradt/LLMTest_NeedleInAHaystack/raw/main/img/GPT_4_testing.png) ![image](https://github.com/gkamradt/LLMTest_NeedleInAHaystack/raw/main/img/Claude_2_1_testing.png)

enhancement
sft

看代码好像没使用ntk?是依靠rope硬拓展,在200k数据上微调?

question
doc-not-needed

Is there any technical report?Thanks a lot!

question

- [ ] Add cross-link

doc
good first issue
help wanted

6b模型+6b-200k模型能有怎样的性能呢? https://github.com/cg123/mergekit

enhancement
help wanted
performance