keras-yolo2
keras-yolo2 copied to clipboard
Loss nan on Jupyter notebook
Hi,
I've seen the issue was resolved for someone here: https://github.com/experiencor/keras-yolo2/issues/237
However I've set my warmup batches to 3 and I'm still getting nan on training.
This only occurs when trying to use training with a weights file previously created through training (in an attempt to improve on it), rather than doing it fresh. Any ideas?
If you're resuming training with your pretrained weights, try loading your weights here just before compilation like so. Worked for me!
optimizer = Adam(lr=0.5e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
#optimizer = SGD(lr=1e-4, decay=0.0005, momentum=0.9)
#optimizer = RMSprop(lr=1e-4, rho=0.9, epsilon=1e-08, decay=0.0)
model.load_weights("YOURWEIGHTS.h5")
model.compile(loss=custom_loss, optimizer=optimizer)
model.fit_generator(generator = train_batch,
steps_per_epoch = len(train_batch),
epochs = 100,
verbose = 1,
validation_data = valid_batch,
validation_steps = len(valid_batch),
callbacks = [early_stop, checkpoint, tensorboard],
max_queue_size = 3)
~~Trying to update the nan related issues - what worked for me is adding images and annotations added to "valid_image_folder" (I previously relied on having the training ones split 80/20 as per the readme - but I got nan for losses). I also changed the traing nb_epochs to 10 - and will likely need more (from 1)~~ Actually, it might have to do with model anchors. Generating new ones (other than the ones given in the README example) lead to the NAN values.
On issue #237 what do they mean by warmup stage?
It is a trick. It makes the each cell match with the size of anchors, this makes the weights get better values to start than just random values. It is not wrote in the original paper, but it seens the original author does the same thing on his implementation on c++. Abraço :)
To do the warmup stage I just need to assign a value to WARM_UP_BATCHES, or do I need add something else to the code in the notebook?
I saw that I was missing the following lines in my code, and when I added them I was able to train the model for 10 epochs without the nan loss appearing for the pascal 2007 dataset. When I added the pascal 2012 dataset to train with the same data as the yolo paper, the nan loss appeared again in the middle of the first epoch, starting at the conf loss.
`layer = model.layers[-4] # the last convolutional layer weights = layer.get_weights()
new_kernel = np.random.normal(size=weights[0].shape)/(GRID_HGRID_W) new_bias = np.random.normal(size=weights[1].shape)/(GRID_HGRID_W)
layer.set_weights([new_kernel, new_bias])`
Why the last layer is refered to as [-4]? If add more layers between the feature extractor and this layer do I have to initialize their weights as well? Which index would I use to refer to the last 4 layers for example?
If i recall correctly, just set WARM_UP_BATCHES and you're good. You can tell its working because your loss will jump (probably up) once the warm up is complete, so just keep an eye on the output
i think layer[-4] is the last convolutional layer of the 'feature extractor' prior to the 'object detection layers' in the complete model.
I had a lot of NaN issues when I wasnt using a properly configured GPU, something to check maybe
check #291 this YOLO structure are based on a model inside another model, the layer[-5] is a full model with the convolutions layers, the layer [-4] it is a convolution layer that does the detections, the layer[-3] it is a reshape to organize the outputs, and the layer[-2] and [-1] it is a workaround to put the ground truth boxes inside the model when it is training, you can take it out after the training. If you check the code the detection layer has a special weight initializer, if you add more layer you will need to initialize with keras initializer or repeat the same initializer used on that layer.
I had a lot of NaN issues when I wasnt using a properly configured GPU, something to check maybe.
I increased the capacity of the GPU, because before I was just using a fraction, but I am still getting the nan loss for the bigger dataset. How did you configure your GPU?
If you check the code the detection layer has a special weight initializer, if you add more layer you will need to initialize with keras initializer or repeat the same initializer used on that layer.
Do you mean the initializer code that I mentioned above or this keras kernel_initializer='lecun_normal' that you mentioned on #291? Do I have to use both or just one?
Another question, on the yolo paper he pre-trained the Darknet-19 on Imagenet then he added 3 convolutional layers before the detection layer to train for detection. Why didn't you add these 3 layers when using different backends? And why do you load the weights from external .h5 files instead of using the keras argument weights='imagenet' when you import the models?
Eg: MobileNet(input_shape=(224,224,3), include_top=False, weights='imagenet')
Do you mean the initializer code that I mentioned above or this keras kernel_initializer='lecun_normal' that you mentioned on #291? Do I have to use both or just one?
I don't know if this special initializer must be used on all layers
And why do you load the weights from external .h5 files instead of using the keras argument weights='imagenet' when you import the models?
In my experience I got worst results using this pretrained weights, I'm also trying to create a script to train a backend model for classification before using it for detection, but I'm getting worst results as well doing that. I'am interested in create a custom backend capable to run at high FPS under CPU for simples tasks, so it is why I'am trying to re create all steps did by the original author, if anyone are interest, I'am inviting you to help me in the development at this branch
Hi @rodrigo2019,
Do you know how to test the model speed in FPS?
from predict.py
import time
.
.
.
video_reader = cv2.VideoCapture(image_path)
nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
video_writer = cv2.VideoWriter(video_out,
cv2.VideoWriter_fourcc(*'MPEG'),
50.0,
(frame_w, frame_h))
for i in tqdm(range(nb_frames)):
_, image = video_reader.read()
t=time.time()
boxes = yolo.predict(image)
print("fps: ", 1/(time.time()-t))
image = draw_boxes(image, boxes, config['model']['labels'])
video_writer.write(np.uint8(image))
I think this is the easiest way to do, you can improve that doing the mean value from N past samples.
ps: maybe you can improve the speed a lot predicting batches instead a single sample, but you will need change the code architeture and you will create a delay between the real time video and the predictions, maybe a very small delay, but this delay will be there
Hi @ZacharyForrest what do you mean by properly configured GPU? Any special steps you had to take? I have NaN
values for randomly initialized weights. Were you using pretrained weights and still getting nan
?
@IMABUNNEH Have you solved your problem? I'm still getting nan even that I've set my warmup batches to 3 and reset the anchors. I used the pre-trained weight "full_yolo_backen.h5"