liushixuan

Results 4 issues of liushixuan

作者您好!最近读了关于您写的对抗训练部分代码,非常的感兴趣,同时对smart算法的部分有一点疑惑, # runs at the start of each epoch self.init_tilda_op = tilda.assign(param) # runs at the end of each epoch self.update_tilda_op = tilda.assign( (1 - tilda_beta) * param + tilda_beta...

您好,我想请问一下在代码中labels != -100的作用是什么。根据论文中的理解,mask的作用应该是遮盖query的以计算response的长度,但是按照代码中的写法,似乎是固定的max_length长度。希望您能够帮助解答,感谢!

Hi, I see there is a bool variable in **_get_batch_logps** of **trainers.py** to control whether get the average log probability or not. And I have two questions. 1. Did you...

This is an attempt to add grpo trainer. It has been shown in qwen2.5-math and deepseek-math that grpo can provide effective help for math tasks. Therefore, this PR aims to...