Add Reasoning-Gym Experiments
Hello, team! Congratulations on producing such an excellent and novel architecture.
I'm a contributor to a project called Reasoning Gym, which has difficulty-adjustable dataset generators for more than 100 types of reasoning tasks (math, logic, games, etc.)
We're very interested to see if HRM can solve some of the harder tasks that other LLMs struggle with, and would like to run some experiments with our dataset generators and your model.
Is this an 'active' GitHub project - would you accept PRs if I added support for Reasoning-Gym to this repo?
Thanks so much, Rich
Thanks for your recommendation! We welcome PRs and are happy to collaborate! I see that many Reasoning Gym tasks are trained with RL, and HRM indeed supports RL. It may indeed require some tuning and handling with sparse reward.
Please post results once complete. ETA?
I looked into this just a little bit. Some tasks I think could maybe work with the current HRM implementation:
n_queens pool_matrix rearc rectangle_count (if modified) rotate_matrix rotten_oranges shortest_path sokoban spiral_matrix survo tsumego
The one I'm most interested in right now is rubiks_cube, but that requires some knowledge that the model doesn't have. In fact, most of them require some kind of language understanding, base knowledge and instruction tuning.
I wonder what the plan roadmap is for the greater HRM project? Is there a plan to do a big general pretrain with a tokenizer and make a base model that we can instruction-tune and interact with the way we do with the LLMs we know today? Or is HRM just for small, single-purpose models which operate on structured dataframe problems?