steppy icon indicating copy to clipboard operation
steppy copied to clipboard

have to clean cache manually

Open mromaniukcdl opened this issue 6 years ago • 2 comments

Having to call step.clean_cache() is error-prone. Ideally, we should have automatic cache invalidation.

mromaniukcdl avatar May 08 '18 10:05 mromaniukcdl

Since caching depends on the input, the steps that sets cache_output=True would need the input object as well. From this insight, I propose the following API:

data_fit = {'input':...
            'id': 'data_fit'}

data_val = {'intput':...,
            'id': 'data_val'}

new_tfidf_step = Step(name='TF-IDF',
                      transformer=StepsTfidfTransformer(),
                      input_steps=[new_count_vec_step],        
                      input_data=['input'],
                      experiment_directory=EXPERIMENT_DIR_B,
                      cache_output=True)

The new_tfidf_step step will use the value of id as a key for caching. How do you feel about this API?

thomasjpfan avatar Jun 23 '18 17:06 thomasjpfan

@thomasjpfan thank you for you idea and PR (and sorry for late reply)!

I will take a closer look at it next week and let you know how we will proceed with it

kamil-kaczmarek avatar Jul 24 '18 10:07 kamil-kaczmarek