Available configurations/hyperparameters for data science scenarios
Hi team,
First, thank you for the excellent work on the project. I’ve had success running the RD agent on several mle-bench/kaggle tasks, and now I’m looking to better understand and control the full execution lifecycle of the agent.
Are there any configuration options or hyperparameters exposed for the RD agent that I can start experimenting with?
For example, in other MLE-Bench agents, I’ve seen configurations such as: • step: maximum number of steps an agent can take • time_limit: total allowed time for the entire process (including Python execution) • exec_timeout: maximum time allowed for a single Python execution
…and potentially others.
I assume I can manage the overall time_limit by placing a timeout around the rdagent command itself, but I’d like to know what else is configurable and whether these parameters—or equivalents—are available for the RD agent.
It would be extremely helpful if the documentation could provide more detail on this. Thank you!
Hi, @ShuxinLin , thanks for your question and for trying out RD-Agent!
You can check the configurable parameters for rdagent data_science in rdagent/app/data_science/loop.py.
For parameters that can be set via the .env file, see rdagent/app/data_science/conf.py.
For example, you mentioned:
-
step → corresponds to the
step_nandloop_narguments in themain()function; -
time_limit → corresponds to the
timeoutargument inmain()function; -
exec_timeout → corresponds to the
full_timeoutparameter inconf.py;
We’ll update the documentation soon to make these options clearer.
Hope this helps you better control the agent’s execution!