What is the purpose of ACT if turned off during evaluation / inference?
I'm trying to wrap my head around this paper, and one thing I find confusing in the reference repo is how ACT is only active during training. Doesn't that negative the purpose of having adaptive compute during inference time, and isn't what you want to show that it learns to stop reasoning during inference when further reasoning won't further improve the result?
Since batched inference with ACT is complex and requires dynamically scheduling multiple sequences, in this repository we provide the simplest version that runs to the maximum number of steps, as this does not make the results worse.
Thanks for clarifying. Do you have plans to publish the dynamic scheduling version of the code?