deep-learning-with-python-notebooks icon indicating copy to clipboard operation
deep-learning-with-python-notebooks copied to clipboard

5.2 Using convnets with small datasets

Open DLumi opened this issue 4 years ago • 4 comments

I quite literally copied the code from notebooks 37-38 but the training fails on 1st epoch 63/100 with an error saying something like: your input is out of data. So judging by the changes we made to the generator, the batch of 32 is too much and I managed to get everything working with a lesser batch of 20 we used in a previous example. Did I do something wrong or that's just an error in the code?

DLumi avatar Oct 16 '20 11:10 DLumi

step_per_epoch should be sample_size//batch_size

ghimireadarsh avatar Oct 19 '20 13:10 ghimireadarsh

I have the same question. Why the book and this github can use steps_per_echo = 100, which is > smaple_size//batch_size. And showed a successful run? Is this because of tf version version difference? (Of course the pipeline works if I changed the steps_per_echo to 63.) When I was reading the book, it seems data augmentation can permute many more figures at each time and so the sample_size can be expanded. But now I am confused. Do anyone know this?

Thanks!

yeswzc avatar Oct 30 '20 15:10 yeswzc

I also have the same question. I tried to generate 10,000 pictures from the generator. It worked. But when applied to the fit and fit_generator method of the model. It failed at 63. It seems that the generator does not generate data permenantly.

runzhi214 avatar Nov 18 '20 08:11 runzhi214

This change applies to new versions. There are several ways to work around this problem. using. repeat() , either create an data augmentation set of 3200 for train_gen and 1600 for val_gen (provided that in train_datagen.flow_from_directory (batch_size=32). It is a pity that the data augmentation works differently.

Calling the generator configured for data augmentation gives a non-repeatable result. I generated 1000 images with no duplicates. in this case, k is the number of identical images. my stupid algorithm compares each image with each one, so k should be equal to arr_pic. shape[0] (or the number of images)

import numpy as np

img_path = fnames[1]

img = image.load_img(img_path, target_size=(150, 150))
x = image.img_to_array(img)

x = x.reshape((1,) + x.shape)
i = 0

arr_pic = np.array([]).reshape((0, 150, 150, 3))
for batch in datagen.flow(x, batch_size=1):
  arr_pic = np.append(arr_pic, batch,axis=0)
  i += 1
  if i % 1000 == 0:
    break
print(arr_pic.shape)
k=0
new_arr = arr_pic.tolist()
for i in new_arr:
  for j in new_arr:
    if i == j:
      k+=1
k

This may mean that at the beginning of each epoch, when the generator runs through a set of images again, completely new images will be generated (there will be no duplicates in any epoch). this means that the data set is expanded from [train_count_pic] to [train_count_pic]*[epochs]. in our case, this is 2000 * 100 = 200,000 non-repeating images, respectively, for validation data, this is 1000 * 100 = 100,000 non-repeating images. I apologize for my English.

pschdl1c avatar Nov 21 '20 23:11 pschdl1c