Guofeng Yi comments

Results 51 comments of


                                            Guofeng Yi

Yi-34B-Chat-4bit 无法配合langchain作为agent使用

好的没问题我后续加一下~

Yi-34B-Chat-4bits运行报错

autoawq不支持v100, https://github.com/casper-hansen/AutoAWQ/issues/290

Yi-34B-Chat-4bits运行报错

这里我们推荐了第三方的各种格式的量化版本 https://github.com/01-ai/Yi?tab=readme-ov-file#%EF%B8%8F-quantitation

微调 YI-6B 一直出现 loss scale overflow 然后 reduce 到 min_loss_scale 报错, YI-6B-Chat 则没问题，chat 模型训练参数设置有什么不同吗

不管是Yi-6B还是Yi-6B-Chat应该都会有这个问题，但是在训练的时候这个INFO是没有影响的。如果你想解决这个问题你可以将[此处](https://github.com/01-ai/Yi/blob/6d3863190ec9d7649bb9ea001d1d3680995c6a4d/finetune/utils/ds_utils.py#L41C14-L41C14)fp16设置为False 具体导致这个问题的原因请参考：https://github.com/PKU-Alignment/safe-rlhf/issues/21#issuecomment-1562420980

微调 YI-6B 一直出现 loss scale overflow 然后 reduce 到 min_loss_scale 报错, YI-6B-Chat 则没问题，chat 模型训练参数设置有什么不同吗

> > 不管是Yi-6B还是Yi-6B-Chat应该都会有这个问题，但是在训练的时候这个INFO是没有影响的。如果你想解决这个问题你可以将[此处](https://github.com/01-ai/Yi/blob/6d3863190ec9d7649bb9ea001d1d3680995c6a4d/finetune/utils/ds_utils.py#L41C14-L41C14)fp16设置为False 具体导致这个问题的原因请参考：[PKU-Alignment/safe-rlhf#21 (comment)](https://github.com/PKU-Alignment/safe-rlhf/issues/21#issuecomment-1562420980) > > 现在发现的导致原因是因为我用 Yi-6B 模型的 embedding 层 token 没经过训练，对应的向量数值非常小。重新赋值后没问题了。 👍 感谢分享~

Any technical report？

The technical report has been released, please review it：https://arxiv.org/abs/2403.04652

如何从4k扩展到200k

首先目前大部分的开源LLM选用的rope base是10000，比如我们的Yi-9B。至于像6B/34B-200K这样针对long context场景的模型，我们参考了dynamic NTK的实现，将rope base调大到5m甚至10m继续训练，来获得外推到200k及以上窗口长度的能力。更详细的解读请参考https://zhuanlan.zhihu.com/p/660073229

如何从4k扩展到200k

@YouYouCoding 具体细节我也不清楚，我倾向于是后者，即较短的长度(32k？)，依靠一些外推能力来实现200K。可以参考一下report中参考文献22，87

200K chat model performance

Please review our technical report, which reports the results of the "Needle in a Haystack" test

It would be nice to test the model on more benchmarks

The technical report has been released: https://arxiv.org/abs/2403.04652