ARC-AGI: Completely separate train/test examples at puzzle level

Open dywsy21 opened this issue 5 months ago • 2 comments

What

An attempt to remove Test-Time Training and completely resolve the issue #18

Description

I saw #18 and was interested in how the model would behave in ARC-AGI if it only used puzzle inputs/outputs from train instead of also incorporating the inputs from test.

While I know that TTT is allowed in ARC-AGI, training on test examples beforehand does allow the model to have an unfair understanding of the implied rules used in them. It would be interesting to see how the H&L arch could figure out the implied rules it has not seen before, just like humans.

By removing TTT your model's evaluation result on ARC-AGI can be more convincing and more indicative of the model's actual generalization abilities. Let me know if this approach will help, happy to chat~

Aug 04 '25 08:08 dywsy21

Aug 04 '25 11:08 helma436

Does the TTT setting for ARC-AGI allow for parameter updates across evaluation examples?

If it doesn't then doing Training + TTT together represents a very different setting than Training -> TTT per evaluation instance right? Each evaluation instance would be iid in that case, and the model cannot use generalised information from the evaluation.

Aug 13 '25 03:08 shawntan