Junyang Lin
Junyang Lin
这里做下解释。在Qwen2本次设计中,MoE是medium size的模型主力,仅仅激活14B,但是达到30B模型的效果。但当前生态对MoE的支持还不完善,57B的模型大小对于显存要求较高,我们正在计划补上14和32这两个size的模型,但这两个模型毕竟是比较大的模型,还需要一些时间。 We previously hope that the MoE model can be your choice for a medium-size model. It actually only activates only 14B params in each forward pass but it...
I'll close this as we are marching fast with community efforts. Reopen this for discussion of metagpt for this project.
> @huybery can you merge it Sry, I still need @neubig Graham to take a final look. Me and @huybery do not have expertise in this field. Apparently I think...
Yes, it is urgent to build a small evaluation pipeline. Xingyao just uploaded a container https://github.com/OpenDevin/OpenDevin/pull/60, and we also have SWE-Lite or Jiaxin just found out what the Devin team...
things to notice: 1. we do not have bos token and eos token. we use `` to separate the docs. 2. pay attention to the rope theta. ours are usually...
https://qwen.readthedocs.io/zh-cn/latest/benchmark/hf_infer.html#
Actually there are a lot of problems in the evaluation pipeline that need fix. Temporarily we are still updating the whole evaluation. Stay tuned
could you please try `docker ps` and `docker stop ` and then rerun the project?
Never did this before. We usually use the default generation params. Check `generation_config.json` for each model. We have tested and figured the ones that we believed to be good before...
please provide the details for reproduction, which code, which model checkpoints. better provide a colab notebook for reproduction. usually, this case happens with quantized models and low temperature.