automatminer icon indicating copy to clipboard operation
automatminer copied to clipboard

MatPipe could have the ability to checkpoint, and resume

Open ardunn opened this issue 6 years ago • 2 comments

Often times, parts of a pipeline will work fine (e.g., featurization), but the entire pipeline will fail because something down the line throws an error. It would be nice to have a "checkpoint" option like so:

pipe = MatPipe(**some_config, checkpoint="/home/user/checkpoint_dir")

When starting anew (i.e., no checkpoints), matpipe starts from scratch, and saves intermediate objects and dataframes to checkpoint dir.

When warm starting (checkpoint dir exists), it loads the relevant data from the checkpoint dir so that it doesn't wind up doing extra work (doesn't have to refit if already fit, doesn't have to refeaturize if refeaturized, etc.)

ardunn avatar Feb 05 '19 18:02 ardunn

This could be done by, after some list of classes is done fitting (or transforming), saving the mid-transform df and all fit classes, and noting the progress of the matpipe in some auxiliary file. Would also need to define some from_checkpoint method and make some changes to MatPipe.transform and MatPipe fit. Should also decide on whether introducing all this complexity would be worth it in the end

ardunn avatar Apr 09 '19 02:04 ardunn

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/matminer/65XNOrsSu2s/j4wgXQ_MBAAJ

Or at least add the ability to pause a tpot optimization?

ardunn avatar Sep 13 '19 19:09 ardunn