DeepPavlov icon indicating copy to clipboard operation
DeepPavlov copied to clipboard

Add ability to cache preprocessing for datasets

Open yoptar opened this issue 6 years ago • 2 comments

Running preprocessing every epoch slows SQUAD training 2x times. This is also an issue of #569 where preprocessing is embedded in a dataset iterator

yoptar avatar Nov 22 '18 11:11 yoptar

I think so, this would be a solution for issue #608. In training phrase, the preprocessing will repeat in each epoch even with completely constant input data.

rtygbwwwerr avatar Nov 27 '18 12:11 rtygbwwwerr

I think the framework could run preprocess pipe line at first phrase, and then use data augmenter(If we have) and iterator process and feed batch data to train graph.

rtygbwwwerr avatar Nov 27 '18 13:11 rtygbwwwerr