alexchiu
alexchiu
Thanks for your interest, @JihwanEom. Since some experiments in the blog were conducted on an early version of the PR, we need some time to organize our recipe before sharing...
Hi @JihwanEom , I has tested the following recipe. ```yaml defaults: distillation_math.yaml distillation: num_prompts_per_step: 512 max_num_steps: 500 val_batch_size: 512 val_period: 20 loss_fn: kl_type: reverse checkpointing: model_save_format: "torch_save" keep_top_k: 3 checkpoint_dir:...
Regarding Q4, yes, I don’t think there’s any difference between VLM and LLM in terms of on-policy distillation except for the input. Have you tried directly changing the model name...
You’re welcome. If you run into any issues with VLM, feel free to open a new issue so we can track it.
Sorry for the delay @JihwanEom . > can I use the async-rollout feature for on-policy KD? Sure, I think the on-policy KD also supports async rollout. Have you tried it?
It seems that this is not a bug unique to on-policy distillation, but rather a bug in the checkpointing of DTensor V2 policy, and several similar issues #1427 #1391 have...
@uygarmv sorry for the delay. The quick fix solution is to use DTensor V1 path. ``` uv run examples/run_distillation_math.py checkpointing.model_save_format=null policy.dtensor_cfg._v2=false ``` I think our colleagues will address the DTensor...
> @zpqiu sorry for the long delay, I have put some comments; could you first merge in main and then address them? I think this solution can be further optimized...
> @zpqiu can you fix the functional test failure? > > Also I think the L1 functionality is ran on Ampere GPUs, maybe you need to conditionally skip for cuda...
> How about using [`StatefulDataloader`](https://pytorch.org/data/beta/torchdata.stateful_dataloader.html) instead of `Dataloader`? `StatefulDataloader` provides `state_dict` and `load_state_dict` methods that may support resuming the iterator position of mid-epoch checkpointing. Thank you. Will verl be modified...