alexchiu comments

Results 11 comments of


                                            alexchiu

Request for training recipe of on-policy KD

Thanks for your interest, @JihwanEom. Since some experiments in the blog were conducted on an early version of the PR, we need some time to organize our recipe before sharing...

Request for training recipe of on-policy KD

Hi @JihwanEom , I has tested the following recipe. ```yaml defaults: distillation_math.yaml distillation: num_prompts_per_step: 512 max_num_steps: 500 val_batch_size: 512 val_period: 20 loss_fn: kl_type: reverse checkpointing: model_save_format: "torch_save" keep_top_k: 3 checkpoint_dir:...

Request for training recipe of on-policy KD

Regarding Q4, yes, I don’t think there’s any difference between VLM and LLM in terms of on-policy distillation except for the input. Have you tried directly changing the model name...

Request for training recipe of on-policy KD

You’re welcome. If you run into any issues with VLM, feel free to open a new issue so we can track it.

Request for training recipe of on-policy KD

Sorry for the delay @JihwanEom . > can I use the async-rollout feature for on-policy KD? Sure, I think the on-policy KD also supports async rollout. Have you tried it?

HF Conversion for On-policy Distillation Trained Models

It seems that this is not a bug unique to on-policy distillation, but rather a bug in the checkpointing of DTensor V2 policy, and several similar issues #1427 #1391 have...

HF Conversion for On-policy Distillation Trained Models

@uygarmv sorry for the delay. The quick fix solution is to use DTensor V1 path. ``` uv run examples/run_distillation_math.py checkpointing.model_save_format=null policy.dtensor_cfg._v2=false ``` I think our colleagues will address the DTensor...

feat: KV cache quantization support in fp8 rollout in GRPO

> @zpqiu sorry for the long delay, I have put some comments; could you first merge in main and then address them? I think this solution can be further optimized...

feat: KV cache quantization support in fp8 rollout in GRPO

> @zpqiu can you fix the functional test failure? > > Also I think the L1 functionality is ran on Ampere GPUs, maybe you need to conditionally skip for cuda...

[Feature Request] Add state saving/loading support for SequentialSampler

> How about using [`StatefulDataloader`](https://pytorch.org/data/beta/torchdata.stateful_dataloader.html) instead of `Dataloader`? `StatefulDataloader` provides `state_dict` and `load_state_dict` methods that may support resuming the iterator position of mid-epoch checkpointing. Thank you. Will verl be modified...