automlbenchmark
automlbenchmark copied to clipboard
Define abstractions for framework integration
The goal is to provide an abstraction and default implementation(s) for most common scenarios.
This would also allow frameworks to support several versions easily.
Finally, and more structured framework runner will simplify the integration effort and standardize support for extra features like the _save_artifacts
param.
1st suggestion (incomplete, and will change):
class FrameworkRunner:
def __init__(self, config, dataset): pass
def prepare_data(self): pass
def fit(self, …): pass
def predict(self, …): pass
def get_result(self): pass
def save_artifacts(self): pass
Based on our discussion, we should include some type of recovery mode.
With this refactor, it will also be easier to use the various "checkpoints" to store partial results and/or have more dynamic time cut-offs. For example, the one hour time limit could be more strictly enforced for just the fit
call, while being (much) more lenient in phases after fit
, as compared to having a single large budget for all phases combined. This will avoid both scenarios where EC2 instances live needlessly long because they get hung in a fit
call and those were results are incompletely merely because the predict
part took longer.