steppy
steppy copied to clipboard
have to clean cache manually
Having to call step.clean_cache()
is error-prone. Ideally, we should have automatic cache invalidation.
Since caching depends on the input, the steps that sets cache_output=True
would need the input object as well. From this insight, I propose the following API:
data_fit = {'input':...
'id': 'data_fit'}
data_val = {'intput':...,
'id': 'data_val'}
new_tfidf_step = Step(name='TF-IDF',
transformer=StepsTfidfTransformer(),
input_steps=[new_count_vec_step],
input_data=['input'],
experiment_directory=EXPERIMENT_DIR_B,
cache_output=True)
The new_tfidf_step
step will use the value of id
as a key for caching. How do you feel about this API?
@thomasjpfan thank you for you idea and PR (and sorry for late reply)!
I will take a closer look at it next week and let you know how we will proceed with it