Yifeng Wang(正经人王同学)

Results 85 comments of Yifeng Wang(正经人王同学)

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? https://arxiv.org/pdf/2504.13837 it seems that distillation and more sampling would be better than RLVR when pass@k is bigger

TTRL: Test-Time Reinforcement Learning https://arxiv.org/pdf/2504.16084

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization https://arxiv.org/pdf/2504.05812

similar core simple principle: ReTool: https://arxiv.org/abs/2504.11536 ReSearch:https://arxiv.org/abs/2503.19470 ReCall:https://github.com/Agent-RL/ReCall “Less is More for Reward Design" is currently a major trend in rewards. By internalizing the tool call and driving reinforcement learning...

reward design insight: https://www.bespokelabs.ai/blog/improving-multi-turn-tool-use-with-reinforcement-learning

combine learning and exploring mind:https://github.com/ElliottYan/LUFFY

OTC: Optimal Tool Calls via Reinforcement Learning https://arxiv.org/pdf/2504.14870

for example , with an agent+toolkit, we may can let user send the api docs and then get the mcp code

hello @fengju0213 ,for example,when we do research task, we can let the serveral search agent search google\bing\baidu in the meantime