deeptrain icon indicating copy to clipboard operation
deeptrain copied to clipboard

Can we have an example showing how to use in-memory object, e.g., TF Dataset

Open leocnj opened this issue 4 years ago • 1 comments

Thanks for creating this project. It will be very useful to allow TF users to have something like PyTorch Lighting.

I already have my codes that convert text files into TF Dataset and want to give DeepTrain a try. Will you please provide a simple example showing how to use Dataset in DataGenerator.

leocnj avatar Sep 18 '20 00:09 leocnj

Glad you find it useful.

I'm not too familiar with tf.Dataset; I found some of its mechanisms needlessly complicated, and performance gains from using TFRECORDs negligible. If you have any specific features in mind that tf.Dataset provides that DataGenerator doesn't, I can note it as a feature request (or explain it if you're unsure how-to). For example, there's much to gain from Dataset's train-load parallelism, for which I'm willing to consider built-in support (currently, there isn't one).

You have two options: (1) convert your data pipeline from Dataset to a DataGenerator-equivalent; (2) make Dataset work with DataGenerator. I can't say much without seeing your specific Dataset configuration, but I'll show where to look: DataGenerator and DataLoader here, and DataLoader docs. So for (2), you'll need to set up Dataset so that it takes set_num as input and returns data as output. Each DataGenerator has two DataLoaders, one for "data" x and other for labels y (see here).

I'm guessing others will have similar concerns, so I'm willing to take a closer look at your case if you share sufficient code.

OverLordGoldDragon avatar Sep 18 '20 14:09 OverLordGoldDragon