deep-learning-with-python-notebooks
deep-learning-with-python-notebooks copied to clipboard
5.2 Using convnets with small datasets
I quite literally copied the code from notebooks 37-38 but the training fails on 1st epoch 63/100 with an error saying something like: your input is out of data. So judging by the changes we made to the generator, the batch of 32 is too much and I managed to get everything working with a lesser batch of 20 we used in a previous example. Did I do something wrong or that's just an error in the code?
step_per_epoch
should be sample_size//batch_size
I have the same question. Why the book and this github can use steps_per_echo = 100, which is > smaple_size//batch_size. And showed a successful run? Is this because of tf version version difference? (Of course the pipeline works if I changed the steps_per_echo to 63.) When I was reading the book, it seems data augmentation can permute many more figures at each time and so the sample_size can be expanded. But now I am confused. Do anyone know this?
Thanks!
I also have the same question. I tried to generate 10,000 pictures from the generator. It worked. But when applied to the fit and fit_generator method of the model. It failed at 63. It seems that the generator does not generate data permenantly.
This change applies to new versions. There are several ways to work around this problem. using. repeat() , either create an data augmentation set of 3200 for train_gen and 1600 for val_gen (provided that in train_datagen.flow_from_directory (batch_size=32). It is a pity that the data augmentation works differently.
Calling the generator configured for data augmentation gives a non-repeatable result. I generated 1000 images with no duplicates. in this case, k is the number of identical images. my stupid algorithm compares each image with each one, so k should be equal to arr_pic. shape[0] (or the number of images)
import numpy as np
img_path = fnames[1]
img = image.load_img(img_path, target_size=(150, 150))
x = image.img_to_array(img)
x = x.reshape((1,) + x.shape)
i = 0
arr_pic = np.array([]).reshape((0, 150, 150, 3))
for batch in datagen.flow(x, batch_size=1):
arr_pic = np.append(arr_pic, batch,axis=0)
i += 1
if i % 1000 == 0:
break
print(arr_pic.shape)
k=0
new_arr = arr_pic.tolist()
for i in new_arr:
for j in new_arr:
if i == j:
k+=1
k
This may mean that at the beginning of each epoch, when the generator runs through a set of images again, completely new images will be generated (there will be no duplicates in any epoch). this means that the data set is expanded from [train_count_pic] to [train_count_pic]*[epochs]. in our case, this is 2000 * 100 = 200,000 non-repeating images, respectively, for validation data, this is 1000 * 100 = 100,000 non-repeating images. I apologize for my English.