hyper-engine icon indicating copy to clipboard operation
hyper-engine copied to clipboard

How to feed our own data?

Open ymcasky opened this issue 6 years ago • 4 comments

How can I feed my own data instead of using mnist? Like the example in this post https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/build_an_image_dataset.ipynb Thanks for any help!

ymcasky avatar Mar 01 '18 06:03 ymcasky

Hi @ymcasky

Here is an example of custom data provider: it's a simple interface and you basically need to implement next_batch method. Note however that the interface currently works with numpy arrays, tensorflow dataset API is not supported yet.

maxim5 avatar Mar 01 '18 07:03 maxim5

Dear @maxim5

Thanks for your reply! I have 2 question.

  1. The example you provide load whole data in numpy array then implement next_batch. How if my memory can't load whole data?

  2. The keras have api "flow_from_directory" with following example:

train_datagen = ImageDataGenerator(,
        horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(directory=Imgpath,
                                       batch_size=Batch_SIZE, 
                                       shuffle=True, 
                                       target_size = (img_H, img_W))
(x_batch, y_batch) = train_generator.next() 

It is similar to your example but using .next() instead of .next_batch() Can I use this api and using your tool? Thanks for your help!

ymcasky avatar Mar 02 '18 01:03 ymcasky

Hi @ymcasky ,

  1. Since you only need to provide next_batch you can load a new numpy array for each batch without holding the whole training set in memory. I'll make an example for this case.
  2. As far as I see from the source code, it's producing numpy arrays on each iteration, so yes, it must be compatible. Let me know the result if you try it.

maxim5 avatar Mar 02 '18 14:03 maxim5

ok, thank you!

ymcasky avatar Mar 05 '18 01:03 ymcasky