ARC-AGI: Completely separate train/test examples at puzzle level
What
- An attempt to remove Test-Time Training and completely resolve the issue #18
Description
I saw #18 and was interested in how the model would behave in ARC-AGI if it only used puzzle inputs/outputs from train instead of also incorporating the inputs from test.
While I know that TTT is allowed in ARC-AGI, training on test examples beforehand does allow the model to have an unfair understanding of the implied rules used in them. It would be interesting to see how the H&L arch could figure out the implied rules it has not seen before, just like humans.
By removing TTT your model's evaluation result on ARC-AGI can be more convincing and more indicative of the model's actual generalization abilities. Let me know if this approach will help, happy to chat~
.
Does the TTT setting for ARC-AGI allow for parameter updates across evaluation examples?
If it doesn't then doing Training + TTT together represents a very different setting than Training -> TTT per evaluation instance right? Each evaluation instance would be iid in that case, and the model cannot use generalised information from the evaluation.