keras
keras copied to clipboard
tf.keras.utils.split_dataset does not seem to work
System information. -GPU : Tesla T4 (Google colab) -Python 3.7.13
- Exact command to reproduce:
- run this on google colab:
!pip3 install tf-nightly - load train and validation dataset from a directory with at least 63000 rgb pictures (splitted in four classes/subdirectories) passing 'both' as value of the subset parameter and 0.2 as value for the validation_split parameter of method image_dataset_from_directory
- call the split_dataset method on the resulting validation subset;
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior.
split_dataset keep running without logging until the system crashes.
Describe the expected behavior.
split_dataset should return two subsed from the validation dataset
@frigeriomtt, To expedite the trouble-shooting process, could you please provide a complete code and the TensorFlow version you are using. Thank you!
@tilakrayal thanks for answering, here what you asked for:
v1.12.1-79111-g5134091134e # tf git version
2.11.0-dev20220803 # tf version
import tensorflow as tf
train_set, validation_set_ = tf.keras.utils.image_dataset_from_directory(
dir_path,
labels="inferred",
label_mode="int",
class_names=class_names, # list of name of the directories contained in dir_path
color_mode="rgb",
batch_size=batch_size, #32
image_size=IMAGE_SIZE, #(200, 200)
shuffle=True,
seed=seed, # randint(0, 100000)
validation_split=0.2,
subset='both',
interpolation="bilinear",
follow_links=False,
crop_to_aspect_ratio=True
)
validation_set, test_set = tf.keras.utils.split_dataset(dataset=validation_set_, left_size=0.5)```
@frigeriomtt, I ran the code and faced a different error, please find the gist here and share all dependencies to replicate the issue or share a colab gist with the reported error. Thank you!
In your gist, the variable dir_path is not defined; it should contain the path to the directory containing the subfolders listed in class_names (which contains the pictures); I don't know how to share with you sudden directory, but I assume you can use any directory from your gdrive containing subfolders containing images.
gdrive = os.path.join('/content', 'gdrive')
drive.mount(gdrive, force_remount=True)
ds_dir = os.path.join(gdrive, '...') # instead of '...' pleas insert the pad of the directory
@frigeriomtt, Without the reproducible code and the dependencies, it would be difficult for us to debug the issue. In order to expedite the trouble-shooting process, could you please provide a complete code snippet you are using. Thank you!
@tilakrayal I'm sharing with you a directory with both the reproducible code and the dataset (eventually ask for access rights); you'll have to adjust the path strings in accord with the structure of your gdrive (I suggest you to add a shortcut of the 'shared with keras' folder from the 'shared with me' directory to the MyDrive one, doing so the script should work as it is, minus the described problem)
@frigeriomtt, I do not have access to the link you have provided. Could you please provide the required permissions to view the files or the colab gist with the reproducible code. Thank you!
@tilakrayal I received your request and granted permission to your account, you should now be able to access the data (the file is named 'To be shared.ipynb'); thank you, and let me know if something emerges.
@frigeriomtt, Keras/Dataset contains a huge number of image files, it would be difficult for us to pinpoint the issue. Could you please get the example down to the simplest possible repro or the colab gist. That will allow us to determine the source of the issue easily. Thank you!
@tilakrayal I'm not sure what you are asking for; the colab scripts are in the shared directory, instruction for you to access the image dataset from them on your drive are provided in one of my previous comments; if you think there are too much pictures feel free to subsize, but keep in mind that the bug happened with the dataset as it is right now.
Hi @frigeriomtt, Looks like tf.keras.utils.split_dataset, working as expected.
I tried with sample set of images. Please find the gist. Thank you!
Hi @gadagashwini , and thanks for taking charge of the ticket (it's been a month); besides the first usage of image_dataset_from_directory (which fails) it seems that your gist does work (or at least, it does not crash during execution), any idea of the reason why mine doesn't ? You think it might depends from the different size of the dataset (the one I used is a lot bigger, access the directory I shared in a previous comment and try it if you like, I'll grant you access)? I can think of that, or a different set of parameter in image_dataset_from_directory (I don't see, however, how that could be the reason).
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
Hi @frigeriomtt, Crash can be due to large dataset. Share the dataset. Thank you!