MatPipe could have the ability to checkpoint, and resume
Often times, parts of a pipeline will work fine (e.g., featurization), but the entire pipeline will fail because something down the line throws an error. It would be nice to have a "checkpoint" option like so:
pipe = MatPipe(**some_config, checkpoint="/home/user/checkpoint_dir")
When starting anew (i.e., no checkpoints), matpipe starts from scratch, and saves intermediate objects and dataframes to checkpoint dir.
When warm starting (checkpoint dir exists), it loads the relevant data from the checkpoint dir so that it doesn't wind up doing extra work (doesn't have to refit if already fit, doesn't have to refeaturize if refeaturized, etc.)
This could be done by, after some list of classes is done fitting (or transforming), saving the mid-transform df and all fit classes, and noting the progress of the matpipe in some auxiliary file. Would also need to define some from_checkpoint method and make some changes to MatPipe.transform and MatPipe fit. Should also decide on whether introducing all this complexity would be worth it in the end
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/matminer/65XNOrsSu2s/j4wgXQ_MBAAJ
Or at least add the ability to pause a tpot optimization?