keras
keras copied to clipboard
Impossible to create dataset with non-infered labels
System information.
- Have I written custom code (as opposed to using a stock example script provided in Keras): Not really
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
- TensorFlow installed from (source or binary): apt
- TensorFlow version (use command below): 2.10
- Python version: 3.8
- Bazel version (if compiling from source): N/A
- GPU model and memory: NVidia A100 - 40GB
- Exact command to reproduce: cf. collab
Describe the problem.
Hi,
I've been experimenting with the ILSVRC2012 Dataset lately, in order to do some transfer learning.
I've used the tensorflow.keras.utils.image_dataset_from_directory function to create the training dataset,
and let it infer the labels (since every class has its dedicated directory).
However, for the validation dataset, the images are mixed in a single directory, so I need to label them.
When feeding labels to the aforementioned function, the documentation states
[The "labels" argument can be set as] Either "inferred" (labels are generated from the directory structure), None (no labels), or a list/tuple of integer labels of the same size as the number of image files found in the directory. Labels should be sorted according to the alphanumeric order of the image file paths (obtained via os.walk(directory) in Python).
So I fed the function a list containing 50,000 integers for the classes of the val dataset. I made sure the list contains 50,000 integers ranging from 1 to 1000. However, when I attempt to create the dataset, the function returns "Found 50,000 files belonging to 1 classes." and no other output.
Despite this, my models infer on the dataset seamlessly.
This issue was opened on the TensorFlow repository here : https://github.com/tensorflow/tensorflow/issues/57698 and it was suggested that I move it here.
The collab is here : https://drive.google.com/file/d/1OcsFWK6f2erhRRmItLLceqcOLwq940NO/view?usp=sharing
The dataset is a little something I put together for this issue, containing 5 images - 3 of cats (1 3 5) and 2 of dogs (2 4). The labels list is [1,2,1,2,1]. The dataset is created without errors, I can extract the labels (albeit in the wrong order, for some reason, despite not shuffling the dataset), but it only detects a single class and when using it to train a classifier said classifier is unable to compute the loss which seems to confirm the labels are not taken into account.
I thank you for your time and attention !
@gowthamkpr, I was able to reproduce the issue on tensorflow v2.8 and nightly. Kindly find the gist of it here.