automlbenchmark
automlbenchmark copied to clipboard
Benchmark multiple AutoML systems at once
When you want to run multiple AutoML systems, you currently have to call the script twice, e.g.:
python runbenchmark.py TPOT test test
python runbenchmark.py auto-sklearn test test
It would be convenient to allow all a single command to run
- all frameworks in a definition:
python runbenchmark.py frameworks.yaml test test
, and/or - multiple frameworks comma-separated:
python runbenchmark.py TPOT,auto-sklearn test test
, and/or - multiple frameworks from a text file:
python runbenchmark.py frameworksubset.txt test test
With the text file containing a single line comma separated as above, or one framework per line.
The first option is convenient if you have your own framework definition from which you want to run all frameworks. The latter two are more convenient (and less error prone) if you are just interested in running a subset of the default frameworks.yaml
definition.
The more I think about this issue, the less i'm convinced about it.
First, to make this really useful, we need to implement in such a way that all jobs (for all frameworks) are managed by the same queue, otherwise a simple loop like the one in runstable.sh
is good enough, and I don't see the need to add complexity inside the app to support this.
for c in ${constraints[*]}; do
for b in ${benchmarks[*]}; do
for f in ${frameworks[*]}; do
# echo "python runbenchmark.py $f $b $c -m $mode -p $parallel $extra_params"
python runbenchmark.py $f $b $c -m $mode -p $parallel $extra_params
done
done
done
The problem with the loop is that each python process must be completed before starting the next one, so we may be waiting for a few trailing jobs before being able to start many ($parallel) new ones... It's also possible to put each of them in the background, but then more difficulties will arise as the parallelism is not under control anymore...
Now, if we decide to support only multiple frameworks, then we get the same issue with multiple benchmarks and multiple constraints as shown in the loop above. Although this would be very practical, this also means adding a lot of complexity in the app to support this...
Will try to figure out how this could be implemented without compromising the existing logic, but I'm afraid this can't be done for v2
.
I am not familiar enough with the job queuing to weigh in right now, but I am fine providing (approximate) support through the shell script and moving this out of scope for v2
.
I may have found a way to implement it just by adding a layer on top of the various Benchmark
instances: idea is to still create one instance of those per framework-benchmark-constraint combination, create all the jobs for all of them without starting any (or obtain them through a generator, let's see...), and then the layer on top will be in charge of merging them and executing them.
The advantage of this approach is that it shouldn't require serious changes in current Benchmark implementations where most of the logic is done.
I still can't promise it for v2 (I have a lot of work to do on H2O side), but I think it's the right way to approach this.
Don't fret about pushing this into v2 👍 it's just a nice-to-have