DeepPavlov
DeepPavlov copied to clipboard
Add ability to cache preprocessing for datasets
Running preprocessing every epoch slows SQUAD training 2x times. This is also an issue of #569 where preprocessing is embedded in a dataset iterator
I think so, this would be a solution for issue #608. In training phrase, the preprocessing will repeat in each epoch even with completely constant input data.
I think the framework could run preprocess pipe line at first phrase, and then use data augmenter(If we have) and iterator process and feed batch data to train graph.