Yi
Yi copied to clipboard
A series of large language models trained from scratch by developers @01-ai
Hello! Is there any information about how to finetune the 6B-200K context window model?
因为Yi-6B-200K 和Yi-34B-200K两个模型看上去都拥有了200K长度的context能力,这相当令人振奋(GPT4-turbo 仅仅只有128K的能力)。 这一点吸引了海内外很多业内人士的关注,我对此很好奇,想知道我应该如果验证这一点,目前官方好像没有提供相关实验数据或者guide,有没有人能给予一点帮助,谢谢。 ----------------------------------------------------------------------- Because both models Yi-6B-200K and Yi-34B-200K seem to have 200K length context capabilities, this is quite exciting (GPT4-turbo only has 128K capabilities). This has attracted...
Hi I've tried the 34b-chat model on replicate and found that model safety guardrails can be bypassed quite easily with minimum adversarial prompting. The same prompt will fail on any...
Could you please run this context test for the `Yi-6B-Chat` model? Here is the code: [link](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) Below are the result for the `Qwen-72B-Chat` model: ![image](https://github.com/QwenLM/Qwen/raw/main/assets/qwen_72b_needle_in_a_haystack.png) This request is not a...
Could you run this long context test for `Yi-34B-200k`? code: [link](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) The results for `GPT-4-128k` and `Claude 2.1` are as follows: ![image](https://github.com/gkamradt/LLMTest_NeedleInAHaystack/raw/main/img/GPT_4_testing.png) ![image](https://github.com/gkamradt/LLMTest_NeedleInAHaystack/raw/main/img/Claude_2_1_testing.png)
6b模型+6b-200k模型能有怎样的性能呢? https://github.com/cg123/mergekit