Haoran Wang

Results 21 comments of Haoran Wang

> Same problem. I also got different scores from two api-providers on the same inference result generated by [MiniCPM-2B-DPO-BF16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16). One of them is 7.090625, and the other is 6.025. Did...

in generate_reward function, it's true that self_reward_model is referenced before use. but change `self_reward_model = self_reward_model.to(device)` to `self_reward_model = self.self_reward_model.to(device)` will cause other errors def generate_reward( self, prompt: str, response:...

> have you solved it? Sorry, I haven't. When I change the code to `self_reward_model = self.self_reward_model.to(device)`, the program will be in an endless loop...

To some extent, I can run `source /swe_util/swe_entry.sh` in a minute when I directly step into the container. But I didn't see logs of execution after even 3600s

> Other information: I use this model for training: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B. And after several tries, it seems that I can successfully load the first two safetensors, but I always fail to...

> Also, this bug doesn't seem to have existed from the beginning. At first everything was normal, until this problem suddenly appeared yesterday. Wait, I cannot reproduce your error given...

> > > Also, this bug doesn't seem to have existed from the beginning. At first everything was normal, until this problem suddenly appeared yesterday. > > > > >...

> [@UbeCc](https://github.com/UbeCc) [@hijkzzz](https://github.com/hijkzzz) I have encountered this problem before, and I found an interesting thing that the middle ckpt saved is right, but final saved ckpt is broken. But I...

> [@UbeCc](https://github.com/UbeCc) [@PeterSH6](https://github.com/PeterSH6) I can connect you guys. Haoran is my senior. And, good night 😂 Thanks Chenyang, enjoy your day!

> [@UbeCc](https://github.com/UbeCc) Nice suggestion! We can discuss the plan this week. Could you connect with us through WeChat or Slack? Yeah, let me send my WeChat id through email