MLOS
MLOS copied to clipboard
mlos_bench: implement optional early abort logic for time based trials
Further generalizing this via async telemetry
collection during the process might be nice too.
Some additional notes:
In cases of throughput or latency based benchmarks, it's not totally clear how to detect whether a specific trial is worse than a previous one, since some trial could theoretically speed up later or some such.
But, for raw time based ones, what we could do, would be to track the worst value seen so far, and then abort if we exceed that. To do that, we'd need some additional metadata that this benchmark was in fact seeking to minimize wallclock time.
What's tricky is how we incorporate metrics from that. Imagine for instance that you wanted to explain why some params/trials were bad. But in aborting some trials, you give up on gathering that data.
Moreover, we can't actually store a real time value for that trial, since we abort it early. Instead we need to store it in the DB as "ABORTED" or somesuch and then each time we train the optimizer fabricate a value for it. Likely $W+\epsilon$ where $W$ is the worst value seen up until that point (i.e., serially examining historical trial data).
Per discussions, we need:
- a new Environment phase -
abort
- the plan will be for users to add that as commands to their Environment configs that instruct the system how to cancel and cleanup a currently running
run
phase - that will get executed asynchronously
- the plan will be for users to add that as commands to their Environment configs that instruct the system how to cancel and cleanup a currently running
- an additional config option to inform the scheduler when to invoke early abort logic for time based benchmarks
- specifically it needs to know which metrics to look at
- this should probably be per environment
- for instance, right now we don't often teardown the VM for each trial, so the first time the VM gets setup, that necessarily takes longer, so we shouldn't include that in our elapsed time metrics
- could be that we start tracking the elapsed time for every single phase in each environment for each trial and try to infer things, or ...
- we also add a
status
ortelemetry
phase that includes commands used to asynchronously poll status of arun
phase (or should it also support other phases?) in order to feed in-progress metrics back into the system and allow specifying that one of those (maybe just an implicit elapsed time, but probably not if sometimes the db needs to be reloaded for instance and other times it doesn't so the run phase overall may take longer on occassion even if the actual benchmark portion sometimes doesn't)