verl icon indicating copy to clipboard operation
verl copied to clipboard

Support FSDP model ckpt loading and do evaluation on specific dataset: Issue #298

Open jankinf opened this issue 9 months ago • 4 comments

jankinf avatar Feb 24 '25 06:02 jankinf

Can we incorporate this functionality into main_generation?

vermouth1992 avatar Feb 24 '25 08:02 vermouth1992

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Feb 26 '25 00:02 CLAassistant

incorporate fsdp ckpt loading func into main_generation. fixes #298

jankinf avatar Feb 26 '25 04:02 jankinf

any feedback?

jankinf avatar Mar 04 '25 01:03 jankinf

fixed

jankinf avatar Apr 07 '25 05:04 jankinf

May I kindly ask whether this pull request is anywhere closer to get merged?

It is really weird that FSDP checkpoints, in which format every official documentation and recipe recommend users to save their training artifacts, is incompatible with the verl.trainer.main_generation script. 🤦

w568w avatar Jun 04 '25 12:06 w568w

@w568w I tested this when submitting the PR and it worked at the time. However, there hasn't been any feedback from the official team for a while. You can try this code, but I can't guarantee it will work since it's been some time since my last test.

jankinf avatar Jun 05 '25 03:06 jankinf