DeepPavlov Add ability to cache preprocessing for datasets

Add ability to cache preprocessing for datasets

Open yoptar opened this issue 6 years ago • 2 comments

Running preprocessing every epoch slows SQUAD training 2x times. This is also an issue of #569 where preprocessing is embedded in a dataset iterator

Nov 22 '18 11:11 yoptar

I think so, this would be a solution for issue #608. In training phrase, the preprocessing will repeat in each epoch even with completely constant input data.

Nov 27 '18 12:11 rtygbwwwerr

I think the framework could run preprocess pipe line at first phrase, and then use data augmenter(If we have) and iterator process and feed batch data to train graph.

Nov 27 '18 13:11 rtygbwwwerr

DeepPavlov DeepPavlov copied to clipboard

Add ability to cache preprocessing for datasets

DeepPavlov
DeepPavlov copied to clipboard