keras
keras copied to clipboard
Supporting list of files for `keras.utils.text_dataset_from_directory`
Currently the function creates a dataset from files in a directory. It is targeted for text classification, hence it defaults to supporting a nested directory structure with each folder being a class. However, for the use case of just reading text files to a dataset, this API does not fully support the use case. The main inconvenience is that it requires the files to be in the same directory, and having to convert files to end with ".txt".
I'm proposing an additional function for creating a text dataset from a list of filenames. With the list of filenames, we could also remove the ".txt" restriction since each file is explicitly written in the list.
It would be an API that takes the text files to dataset conversion code from the keras.utils.text_dataset_from_directory, but is exposed in a convenient minimal API that is similar to tf.data.TextLineDataset.
People who wish to do the simple task of converting text files into a dataset would benefit from this API. Currently, the only two ways to do so is
tf.data.TextLineDataset: This reads each line instead of the whole filekeras.utils.text_dataset_from_directory: Requires a lot of user action steps including moving the files to be in the same directory, change to .txt files, set parameterlabel=Noneandshuffle=False.
- Do you want to contribute a PR? (yes/no): yes
Hi @jessechancy , Could you please share a reproducible code that supports your statement so that the issue can be easily understood. Thank you!