gengdongyang
gengdongyang
maybe lower num_batches_per_epoch? I guess learning_rate is already low but the num_batches_per_epoch is high, so that during the first epoch training, arguments of distribution had numerical problems in calculating log...
> In my understanding, the gradient descendent doesn't even "know" about the concept of epochs. yes, I think you are right. num_batches_per_epoch is not the root cause. I thought if...
@JianxinMa 您好,想问下dashscope 的function call实现已经上线了吗? 会与 Qwen-Agent 项目封装的 function calling 接口 有什么区别?
@Stephen-SMJ hi Could you please share a well-functioning reference code?
> Thanks for @JianxinMa. Since some people want the code. I am considering releasing the code to the Qwen1.5 repo for your reference. Thanks. hi @Stephen-SMJ ,do you have a...