noisepage
noisepage copied to clipboard
Refactor the mini-runners to create tables on-the-fly
The way that the mini-runners work right now is to first create all the tables required to enumerate input features and then and execute the runners for every OU one-by-one on these tables. This causes lots of memory consumption to start the runners (70~80GB), which may be a problem if we want to run the runners on machines with a limited memory budget.
There are two ways to address this:
- Create tables one-by-one instead of all together. Then after each table creation, run all the mini-runners that need to exercise some input feature combinations on that specific table. Then delete this table, and repeat this process on the next table. This approach saves memory without slowing down the runners, but the downside is that this is a bit intrusive to how the runners are set up right now. We need to flip how the runners are executed.
- Keep the runner execution logic as it is, but do not create any table at the beginning. Instead, within each runner, create a table before exercising the input features related to that table. Then delete this table and repeat for the next table. This does not affect the runner setup and keeps the execution of each runner separate. But this requires re-creating all the tables (takes 5-10min IIRC) for each runner, which increases the total run-time.
I'm not sure which approach is better, but currently leaning towards the first approach since it is more performant and seems cleaner.