One

Results 109 comments of One

The root cause seems to be that the system metrics collection process is blocking the system, slowing down the CUDA training computations. The `log` function itself is not blocking at...

Thanks for reporting. That's a known issue as shared memory dataloader is not implemented yet, we're working on shared memory using MPI or multiprocessing, which will take only 1x RAM.

Global batch size is too small, so LR is too large in this case, leading to divergence. You can try setting batch size as large as possible, then scale LR...

Make sure PyTorch and CUDA are present before `pip install --no-build-isolation adam-atan2`

HRM is a universal backbone. You can have a try.

It seems that the model haven't shown any learning progress. Check the gradient magnitude, parameter norm and loss trends

H-Net operates at token level, which enables tokenizer-free training. HRM operates in latent space for reasoning.

It may be because RNG is involved in data augmentation and the randomness varies between different package versions. We will upload pre-built datasets soon.

Thanks for your reproduction run! The `evaluate.py` does not handle majority voting and is only 1-shot, so it's about 15% lower. ARC-AGI allows 2 shots per task. Could you run...

What's your PyTorch version? `nn.Buffer` is supposed to be the new API