MIScnn
MIScnn copied to clipboard
Re-use prepared data sets or batches
Currently sitatuion: Preprocessed data sets via subfunctions or complete prepared batches are stored in the batches directory with a unique seed. But, there is no re-use possibility of these.
Desired behaviour: Function or parameter to be able to re-use already preprocessed files from the batches directory.
Implementation steps:
- Add an option to the Data IO class for specifying a seed.
- If there are already existing files with the specified seed, then reuse these. Else create new one.
Open questions: Should the configurations (e.g. in data augmentation) be stored in the seed as well? As some kind of hash? E.g. running full data augmentation beforehand and storing files in batches directory. Now running less data augmentation but with same seed. I would expect that he is then ignoring the already prepared batches and creates new one according to new data augmentation configuration?