One comments

Results 109 comments of

One

Add Reasoning-Gym Experiments

Thanks for your recommendation! We welcome PRs and are happy to collaborate! I see that many Reasoning Gym tasks are trained with RL, and HRM indeed supports RL. It may...

pyproject.toml with explicit versioning requirements?

As different GPU hardware require various versions of CUDA, PyTorch, etc. Following most repositories like FlashAttention, Transformers, we don't fix version unless strictly necessary, to minimize incompatibilities during installation.

What is the purpose of ACT if turned off during evaluation / inference?

Since batched inference with ACT is complex and requires dynamically scheduling multiple sequences, in this repository we provide the simplest version that runs to the maximum number of steps, as...

Add Google Colab Sudoku 1k demo (T4-compatible)

Thanks for the PR! 99.3% seems amazing, it's much higher than reported in our paper. Is that result from `evaluate.py`? I can't find the result of evaluation cell in the...

How to do c-RLFT?

C-RLFT is weighted C-SFT. You can use the "weight" field in the training data format to set up weights for different conditions.

Add torch.compile support to flash attention 3

Hi @tridao, have you had a chance to take another look at this PR? This is a much-needed feature for FA3 when used with `torch.compile`.

Test with a New Puzzle out of the Dataset?

We've run the handcrafted Sudoku set from https://github.com/SakanaAI/Sudoku-Bench The released 1000 example checkpoint achieved 92%. The following is complete solution process of Sudoku-Bench. [sudoku-nikoli.pdf](https://github.com/user-attachments/files/21619760/sudoku-nikoli.pdf)

One

arcprize.org leaderboard

Paper Typo?