王昊天
王昊天
I have this question too,need some advice.
same questions... Summarizing a sentence by using the steps like trainning(str to .bin) seems too slow and IO wasting. but rewriting the source code is quite difficult (have tried to...
Same problem, have you solve it?
> Last I looked there were a lot of changes in v1.8 that break the build. It will probably take me some time before I can address them. If you...
> Thanks for your response. I have two questions: 1. Though dbpn would be wrong if belief state generated is wrong, using true turn domain in the evaluate procedure is...
我们中国真是太强大辣~~~~~~
你微调的是chat呀,百川整理问答数据的格式和firefly的格式是不一样的呀,如果微调的是base就不会出现这种情况
方便看一下训练脚本吗,也在尝试sft deepseekv3-base
train好的moe模型有测过benchmark吗?担心有数值问题