Ensure efficient IO when training on large sets of 2D images

Open tvercaut opened this issue 7 years ago • 2 comments

As discussed with @atbenmurray and previously with @wyli and @luiscarlosgph, this is a follow up of cmiclab issue #205. We now have support for 2D images but it's rather crude and would benefit from being optimised for example by storing the images in a dedicated high-performance database (LMDB?).

The first task would be to look into the current state of the art in other TF-based projects for that.

Jul 31 '18 18:07 tvercaut

Probably best to stick to the recommended TFRecord if possible. Some relevant links: https://www.tensorflow.org/performance/datasets_performance https://github.com/tensorflow/tensorflow/issues/21129 https://stackoverflow.com/questions/48309631/tensorflow-tf-data-dataset-reading-large-hdf5-files

Jul 31 '18 19:07 tvercaut

Ok, I'll look into this today

Aug 02 '18 12:08 atbenmurray