keras-io
keras-io copied to clipboard
Using current directory for caching leads to potential data loss
When loading data for the Monocular Depth Estimation example, the current directory is used for caching of the dataset when calling tf.keras.utils.get_file(): https://github.com/keras-team/keras-io/blob/master/examples/vision/depth_estimation.py#L56
In certain situations, this method removes the cache directory: https://github.com/keras-team/keras/blob/v2.6.0/keras/utils/data_utils.py#L143
By doing an unfortunate ctrl-c, I managed to remove all of my uncommitted data.
It might be appropriate to set cache_subdir to a subdirectory rather than the current directory to prevent this from happening. Because the rest of the code makes implicit assumptions on where the dataset is, there needs to be changes in multiple places.
Related to this, the check for whether the dataset has been downloaded does not seem to work, since the /dataset/ directory is never created by this code: https://github.com/keras-team/keras-io/blob/master/examples/vision/depth_estimation.py#L53