Xia Yu issues

Repositories
Issues
Comments

Results 3 issues of


                                            Xia Yu

可以用这个库做chatglm的全量微调吗，需要改代码里面那些部分内容

pending

GRPO as part of HF TRL?

### Feature request Qwen2.5-Math and Qwen2.5-Code are two state-of-the-art models that have recently integrated GRPO (Group Relative Policy Optimization) ### Motivation https://qwenlm.github.io/blog/qwen2.5-math/ https://[arxiv.org/pdf/2402.03300](https://arxiv.org/pdf/2402.03300) ### Your contribution This is a request-only...

✨ enhancement

UI-TARS-1.5-7B Endless Loops on Web Interfaces

We've observed that UI-TARS-1.5-7B agent frequently gets stuck in endless retry loops when interacting with web interfaces. The agent repeatedly attempts the same ineffective action, unable to adapt or learn...