Mismatch in number of ARC-AGI-2 puzzles generated as compared to trained checkpoint
I'm trying to reproduce the evaluation results on ARC-AGI-2 using the provided trained checkpoint, but I get the following error indicating a mismatch between the number of puzzles generated:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ubuntu/HRM/evaluate.py", line 48, in launch
[rank0]: train_state.model.load_state_dict(torch.load(eval_cfg.checkpoint, map_location="cuda"), assign=True)
[rank0]: File "/home/ubuntu/.pyenv/versions/3.12.11/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2593, in load_state_dict
[rank0]: raise RuntimeError(
[rank0]: RuntimeError: Error(s) in loading state_dict for OptimizedModule:
[rank0]: size mismatch for _orig_mod.model.inner.puzzle_emb.weights: copying a param with shape torch.Size([1045829, 512]) from checkpoint, the shape in current model is torch.Size([1045835, 512]
1045835 vs 1045829
It may be because RNG is involved in data augmentation and the randomness varies between different package versions. We will upload pre-built datasets soon.
ARC-2
python dataset/build_arc_dataset.py --dataset-dirs dataset/raw-data/ARC-AGI-2/data
(py31022) ➜ HRM git:(main) ✗ python dataset/build_arc_dataset.py --dataset-dirs dataset/raw-data/ARC-AGI-2/data --output-dir data/arc-2-aug-1000
Traceback (most recent call last):
File "/home/zdx/github/VSAHDC/HRM/dataset/build_arc_dataset.py", line 291, in
fix:
def puzzle_hash(puzzle: dict): # Hash the puzzle for checking equivalence def _grid_hash(grid: np.ndarray): # buffer = [x.to_bytes(1) for x in grid.shape] buffer = [x.to_bytes(1, byteorder='little') for x in grid.shape]
I get similar error
发生异常: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Error(s) in loading state_dict for ACTLossHead:
size mismatch for model.inner.puzzle_emb.weights: copying a param with shape torch.Size([1045829, 512]) from checkpoint, the shape in current model is torch.Size([1921253, 512]).
File "/mnt/d/my/work/study/ai/kaggle_code/arc/HRM/evaluate.py", line 48, in launch
train_state.model.load_state_dict(torch.load(eval_cfg.checkpoint, map_location="cuda"), assign=True)
File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2593, in load_state_dict
"Error(s) in loading state_dict for {}:\n\t{}".format(
self.__class__.__name__, "\n\t".join(error_msgs)
)
)
return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for ACTLossHead:
Missing key(s) in state_dict: "model.inner.H_init", "model.inner.L_init", "model.inner.embed_tokens.embedding_weight", "model.inner.lm_head.weight", "model.inner.q_head.weight", "model.inner.q_head.bias", "model.inner.puzzle_emb.weights", "model.inner.H_level.layers.0.self_attn.qkv_proj.weight", "model.inner.H_level.layers.0.self_attn.o_proj.weight", "model.inner.H_level.layers.0.mlp.gate_up_proj.weight", "model.inner.H_level.layers.0.mlp.down_proj.weight", "model.inner.H_level.layers.1.self_attn.qkv_proj.weight", "model.inner.H_level.layers.1.self_attn.o_proj.weight", "model.inner.H_level.layers.1.mlp.gate_up_proj.weight", "model.inner.H_level.layers.1.mlp.down_proj.weight", "model.inner.H_level.layers.2.self_attn.qkv_proj.weight", "model.inner.H_level.layers.2.self_attn.o_proj.weight", "model.inner.H_level.layers.2.mlp.gate_up_proj.weight", "model.inner.H_level.layers.2.mlp.down_proj.weight", "model.inner.H_level.layers.3.self_attn.qkv_proj.weight", "model.inner.H_level.layers.3.self_attn.o_proj.weight", "model.inner.H_level.layers.3.mlp.gate_up_proj.weight", "model.inner.H_level.layers.3.mlp.down_proj.weight", "model.inner.L_level.layers.0.self_attn.qkv_proj.weight", "model.inner.L_level.layers.0.self_attn.o_proj.weight", "model.inner.L_level.layers.0.mlp.gate_up_proj.weight", "model.inner.L_level.layers.0.mlp.down_proj.weight", "model.inner.L_level.layers.1.self_attn.qkv_proj.weight", "model.inner.L_level.layers.1.self_attn.o_proj.weight", "model.inner.L_level.layers.1.mlp.gate_up_proj.weight", "model.inner.L_level.layers.1.mlp.down_proj.weight", "model.inner.L_level.layers.2.self_attn.qkv_proj.weight", "model.inner.L_level.layers.2.self_attn.o_proj.weight", "model.inner.L_level.layers.2.mlp.gate_up_proj.weight", "model.inner.L_level.layers.2.mlp.down_proj.weight", "model.inner.L_level.layers.3.self_attn.qkv_proj.weight", "model.inner.L_level.layers.3.self_attn.o_proj.weight", "model.inner.L_level.layers.3.mlp.gate_up_proj.weight", "model.inner.L_level.layers.3.mlp.down_proj.weight".
Unexpected key(s) in state_dict: "_orig_mod.model.inner.H_init", "_orig_mod.model.inner.L_init", "_orig_mod.model.inner.embed_tokens.embedding_weight", "_orig_mod.model.inner.lm_head.weight", "_orig_mod.model.inner.q_head.weight", "_orig_mod.model.inner.q_head.bias", "_orig_mod.model.inner.puzzle_emb.weights", "_orig_mod.model.inner.H_level.layers.0.self_attn.qkv_proj.weight", "_orig_mod.model.inner.H_level.layers.0.self_attn.o_proj.weight", "_orig_mod.model.inner.H_level.layers.0.mlp.gate_up_proj.weight", "_orig_mod.model.inner.H_level.layers.0.mlp.down_proj.weight", "_orig_mod.model.inner.H_level.layers.1.self_attn.qkv_proj.weight", "_orig_mod.model.inner.H_level.layers.1.self_attn.o_proj.weight", "_orig_mod.model.inner.H_level.layers.1.mlp.gate_up_proj.weight", "_orig_mod.model.inner.H_level.layers.1.mlp.down_proj.weight", "_orig_mod.model.inner.H_level.layers.2.self_attn.qkv_proj.weight", "_orig_mod.model.inner.H_level.layers.2.self_attn.o_proj.weight", "_orig_mod.model.inner.H_level.layers.2.mlp.gate_up_proj.weight", "_orig_mod.model.inner.H_level.layers.2.mlp.down_proj.weight", "_orig_mod.model.inner.H_level.layers.3.self_attn.qkv_proj.weight", "_orig_mod.model.inner.H_level.layers.3.self_attn.o_proj.weight", "_orig_mod.model.inner.H_level.layers.3.mlp.gate_up_proj.weight", "_orig_mod.model.inner.H_level.layers.3.mlp.down_proj.weight", "_orig_mod.model.inner.L_level.layers.0.self_attn.qkv_proj.weight", "_orig_mod.model.inner.L_level.layers.0.self_attn.o_proj.weight", "_orig_mod.model.inner.L_level.layers.0.mlp.gate_up_proj.weight", "_orig_mod.model.inner.L_level.layers.0.mlp.down_proj.weight", "_orig_mod.model.inner.L_level.layers.1.self_attn.qkv_proj.weight", "_orig_mod.model.inner.L_level.layers.1.self_attn.o_proj.weight", "_orig_mod.model.inner.L_level.layers.1.mlp.gate_up_proj.weight", "_orig_mod.model.inner.L_level.layers.1.mlp.down_proj.weight", "_orig_mod.model.inner.L_level.layers.2.self_attn.qkv_proj.weight", "_orig_mod.model.inner.L_level.layers.2.self_attn.o_proj.weight", "_orig_mod.model.inner.L_level.layers.2.mlp.gate_up_proj.weight", "_orig_mod.model.inner.L_level.layers.2.mlp.down_proj.weight", "_orig_mod.model.inner.L_level.layers.3.self_attn.qkv_proj.weight", "_orig_mod.model.inner.L_level.layers.3.self_attn.o_proj.weight", "_orig_mod.model.inner.L_level.layers.3.mlp.gate_up_proj.weight", "_orig_mod.model.inner.L_level.layers.3.mlp.down_proj.weight".
During handling of the above exception, another exception occurred:
File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2593, in load_state_dict
"Error(s) in loading state_dict for {}:\n\t{}".format(
self.__class__.__name__, "\n\t".join(error_msgs)
)
)
return _IncompatibleKeys(missing_keys, unexpected_keys)
File "/mnt/d/my/work/study/ai/kaggle_code/arc/HRM/evaluate.py", line 50, in launch
train_state.model.load_state_dict({k.removeprefix("_orig_mod."): v for k, v in torch.load(eval_cfg.checkpoint, map_location="cuda").items()}, assign=True)
File "/mnt/d/my/work/study/ai/kaggle_code/arc/HRM/evaluate.py", line 68, in <module>
launch()
File "/usr/local/lib/python3.12/runpy.py", line 88, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.12/runpy.py", line 198, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
RuntimeError: Error(s) in loading state_dict for ACTLossHead:
size mismatch for model.inner.puzzle_emb.weights: copying a param with shape torch.Size([1045829, 512]) from checkpoint, the shape in current model is torch.Size([1921253, 512]).
Hi. Has this been solved?