keras-io icon indicating copy to clipboard operation
keras-io copied to clipboard

Using current directory for caching leads to potential data loss

Open igoyak opened this issue 4 years ago • 0 comments
trafficstars

When loading data for the Monocular Depth Estimation example, the current directory is used for caching of the dataset when calling tf.keras.utils.get_file(): https://github.com/keras-team/keras-io/blob/master/examples/vision/depth_estimation.py#L56

In certain situations, this method removes the cache directory: https://github.com/keras-team/keras/blob/v2.6.0/keras/utils/data_utils.py#L143

By doing an unfortunate ctrl-c, I managed to remove all of my uncommitted data.

It might be appropriate to set cache_subdir to a subdirectory rather than the current directory to prevent this from happening. Because the rest of the code makes implicit assumptions on where the dataset is, there needs to be changes in multiple places.

Related to this, the check for whether the dataset has been downloaded does not seem to work, since the /dataset/ directory is never created by this code: https://github.com/keras-team/keras-io/blob/master/examples/vision/depth_estimation.py#L53

igoyak avatar Oct 28 '21 18:10 igoyak