南栖
南栖
我目前也在复现这篇论文的模型架构和训练但是目前遇到一定的问题,能否大家建立一个微信群讨论一下?
等我开源
@dhruvmullick I'm releasing an open-source framework By combining GRPO + QLoRA + DeepSpeed ZeRO-3,https://github.com/Minami-su/deepspeed-grpo-qlora-vllm
Oh, I made a mistake. For the one above, the ingestion uses gpt-4o-mini and the response uses gpt-4.1-mini. For this one, both the ingestion and response use gpt-4.1-mini: === Evaluation...
@nanxingw Added. evaluationV2
Hi @nanxingw, just wanted to check in on the status of this pull request. I've pushed the evaluation scripts you asked for last week. Is there any feedback or anything...
Hi @nanxingw, Thank you so much for the update! I am definitely interested in discussing future work on this. Thank you for sharing your email, I will reach out to...
@nanxingw 81 / 81 files viewed
datasets: https://huggingface.co/datasets/Minami-su/Amara-o1-dataset https://huggingface.co/datasets/Minami-su/Amara-o2-dataset