One comments

Results 109 comments of

One

[Bug]: `wandb.log` blocks training loop periodically

The root cause seems to be that the system metrics collection process is blocking the system, slowing down the CUDA training computations. The `log` function itself is not blocking at...

Load Big Dataset Need So Much RAM

Thanks for reporting. That's a known issue as shared memory dataloader is not implemented yet, we're working on shared memory using MPI or multiprocessing, which will take only 1x RAM.

Full-Scale Experiment with exact_accuracy 0

Global batch size is too small, so LR is too large in this case, leading to divergence. You can try setting batch size as large as possible, then scale LR...

pip install adam_atan2 success then run code get No module named 'adam_atan2_backend'

Make sure PyTorch and CUDA are present before `pip install --no-build-isolation adam-atan2`

HRM implication for hierarchical data understanding

HRM is a universal backbone. You can have a try.

JAX implementation

It seems that the model haven't shown any learning progress. Check the gradient magnitude, parameter norm and loss trends

Difference between HRM and H-Net?

H-Net operates at token level, which enables tokenizer-free training. HRM operates in latent space for reasoning.

Mismatch in number of ARC-AGI-2 puzzles generated as compared to trained checkpoint

It may be because RNG is involved in data augmentation and the randomness varies between different package versions. We will upload pre-built datasets soon.

Reproducibility results of Sudoku-Extreme and ARC-AGI 1

Thanks for your reproduction run! The `evaluate.py` does not handle majority voting and is only 1-shot, so it's about 15% lower. ARC-AGI allows 2 shots per task. Could you run...

Bug: AttributeError: module 'torch.nn' has no attribute 'Buffer' in multiple files.

What's your PyTorch version? `nn.Buffer` is supposed to be the new API