RL HF Conversion for On-policy Distillation Trained Models

I'm trying to convert a checkpoint created by uv run python examples/run_distillation_math.py.

I'm running the following command: uv run python examples/converters/convert_dcp_to_hf.py --config {dir_path}/config.yaml --dcp-ckpt-path {dir_path}/policy/weights/ --hf-ckpt-path {output_path}

It gives the following error:

Traceback (most recent call last):
  File "/home/coder/Uygar/nemo-rl-policy/examples/converters/convert_dcp_to_hf.py", line 73, in <module>
    main()
  File "/home/coder/Uygar/nemo-rl-policy/examples/converters/convert_dcp_to_hf.py", line 62, in main
    hf_ckpt = convert_dcp_to_hf(
              ^^^^^^^^^^^^^^^^^^
  File "/home/coder/Uygar/nemo-rl-policy/nemo_rl/utils/native_checkpoint.py", line 242, in convert_dcp_to_hf
    dcp_to_torch_save(dcp_ckpt_path, weights_path)
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/format_utils.py", line 212, in dcp_to_torch_save
    _load_state_dict(
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/state_dict_loader.py", line 245, in _load_state_dict
    _ = distW.all_gather("read", read_data)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/utils.py", line 284, in all_gather
    raise CheckpointException(step, node_failures)
torch.distributed.checkpoint.api.CheckpointException: CheckpointException ranks:dict_keys([0])
Traceback (most recent call last): (RANK 0)
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/utils.py", line 276, in all_gather
    result = map_fun()
             ^^^^^^^^^
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/logger.py", line 87, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/state_dict_loader.py", line 240, in read_data
    all_reads = storage_reader.read_data(final_local_plan, planner)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/filesystem.py", line 829, in read_data
    with self.fs.create_stream(new_path, "rb") as stream:
  File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/home/coder/Uygar/nemo-rl-policy/.venv/lib/python3.12/site-packages/torch/distributed/checkpoint/filesystem.py", line 511, in create_stream
    with path.open(mode) as stream:
         ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/pathlib.py", line 1015, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/coder/Uygar/nemo-rl-policy/temp/checkpoints_policy/step_440/policy/weights/__0_0.distcp'

Steps/Code to reproduce bug

Run uv run python examples/converters/convert_dcp_to_hf.py --config {dir_path}/config.yaml --dcp-ckpt-path {dir_path}/policy/weights/ --hf-ckpt-path {output_path} on a on-policy distilled model.

Nov 25 '25 13:11 uygarmv

@sharathts can you please take a look and opine

Nov 25 '25 20:11 euronymous-aithal

@sharonyu-115 @zpqiu can you also help

Dec 03 '25 02:12 snowmanwwg

It seems that this is not a bug unique to on-policy distillation, but rather a bug in the checkpointing of DTensor V2 policy, and several similar issues #1427 #1391 have been reported previously. @terrykong

Dec 03 '25 03:12 zpqiu

@uygarmv sorry for the delay. The quick fix solution is to use DTensor V1 path.

uv run examples/run_distillation_math.py checkpointing.model_save_format=null policy.dtensor_cfg._v2=false

I think our colleagues will address the DTensor V2 path issue to fully resolve this bug.

Dec 03 '25 03:12 zpqiu