ssd_tensorflow_traffic_sign_detection icon indicating copy to clipboard operation
ssd_tensorflow_traffic_sign_detection copied to clipboard

ZeroDivisionError: division by zero

Open tony9378 opened this issue 7 years ago • 23 comments

I followed your instruction and trained the model again, but when i run it as a demo, it always shows such an error.

tony9378 avatar Dec 12 '17 08:12 tony9378

Traceback (most recent call last): File "inference.py", line 192, in generate_output(input_files, mode) File "inference.py", line 162, in generate_output image = run_inference(image_orig, model, sess, mode, sign_map) File "inference.py", line 77, in run_inference boxes = nms(y_pred_conf, y_pred_loc, prob) File "E:\Masterarbeit\SSD_Project\model.py", line 248, in nms iou = calc_iou(box[:4], other_box[:4]) File "E:\Masterarbeit\SSD_Project\data_prep.py", line 28, in calc_iou iou = intersection / union ZeroDivisionError: division by zero

tony9378 avatar Dec 12 '17 08:12 tony9378

@tony9378 ,

Have you figured out to this issue? I face the same problem with you. Everytime I finished training my network, I load the new model.ckpt and this error arose.

sudonto avatar Feb 21 '18 14:02 sudonto

I think I have figured out why the error might be happening. Can someone tell me the size of their data_prep_400x260.p. Is it 514 MB? Or is it something less?

YashBansod avatar Feb 28 '18 17:02 YashBansod

@YashBansod , My data_prep_400x260.p is 514MB (514,001,478 bytes to be precise)

sudonto avatar Feb 28 '18 23:02 sudonto

The Div by Zero occurs for one of the two reasons (atleast the ones I have found):

  1. If your data_prep_400x260.p is not made properly. (Verify that it is around 514 MB in size)
  2. If you model was not trained for sufficient number of epochs (i.e. your model must have converged to a certain extent. Maybe value of some variable is very small ~= 0 at the time you ended your training).

To avoid the first problem, just redo the data prep process properly

To avoid the second from happening, change the learning rate to at least 0.01 (its set at 0.001 in the code) and train it for at least 20 Epochs. Your model won't converge much at this point but at least you won't get the Div by Zero Error.

@tony9378 @sudonto can you confirm if this solves your problem?

YashBansod avatar Mar 01 '18 05:03 YashBansod

@sudonto @YashBansod My data_prep_400x260.p is 2.1GB, and the result after running inference.py is not good. I have followed your steps to run data_prep.py. Could you tell me why?

DRosemei avatar Apr 03 '18 08:04 DRosemei

@DRosemei there is something wrong that you are doing before executing data_prep.py. Please follow the pre-processing instructions (all the steps before executing python data_prep.py ) and your data_prep_400x260.p should be around 514 MB. Also, can you try plotting the cost function and see that it was at it's least when you ended your training. For me, it started to rise after certain epochs in one experiment. Anyway, It has been a long time since I worked on this and the code is really not good if you plan to just use it for any sort of benchmarking. Try the original implementations for that. Rather my purpose was for understanding an implementation of SSD in TF and this seemed to be more readable than other. The inference may not be optimal, as the model overfits the data.

YashBansod avatar Apr 03 '18 14:04 YashBansod

@YashBansod My data_raw_400x260.p is 1.4M. But there was no "error" tip. What's wrong with my code?? PS : I downloaded the full dataset, and tackled it.

Jasonsun1993 avatar Apr 09 '18 11:04 Jasonsun1993

@YashBansod Sorry to reply late. Thank you for your reply. I have done pre-processing instructions carefully for several times, but the size dosen't change. By the way, my data_raw_400*260.p is 921.7kb. Is it right? Besides, when I ran inference.py by restoring the model author gave, the result is not good. Only a few boxes are right. As for plotting the cost function, I'm sorry I just start learning DL, so I can't help you. If possible, would you mind sending me a copy of your code because I have wasted a lot of time on it. My email address is [email protected]. Thank you again.

DRosemei avatar Apr 16 '18 08:04 DRosemei

Hi @DRosemei , @Jasonsun1993 The size of data_raw_400x260.p should be around 514MB. Please read this. Particularly, please pay attention the answer from YashBansod dated 26 Feb. @DRosemei , in my case, the model can correctly detect all the signs in the sample images.

sudonto avatar Apr 16 '18 13:04 sudonto

@sudonto Thanks for your reply. I have already read #29, it saves me a lot of time. After running create_pickle.py, I got data_raw_400x260.p(921.7KB) and resized_images_400x260(2,600 items, totalling 139.0 MB). After running data_prep.py, I got data_prep_400x260.p (2.1GB). Could you please tell me where I made mistakes? Besides, my detection on sample images is like this. pedestrian_1323896918 avi_image9 stop_1323804419 avi_image31 Thank you again.

DRosemei avatar Apr 16 '18 13:04 DRosemei

@DRosemei , have you checked what the content of your mergedAnnotations.csv is? Can you confirm that only stop sign and pedestrian cross is in that file?

sudonto avatar Apr 16 '18 13:04 sudonto

@sudonto Thank for your reply. My mergedAnnotations.csv is the same as allAnnotations.csv, because as you said in #29, there is a line of code that filters out the annotation tag other than the desired signs.But I find it in create_pickle.py, not in data_prep.py. Here is the code: sign_name = fields[1] if sign_name != 'stop' and sign_name != 'pedestrianCrossing': continue # ignore signs that are neither stop nor pedestrianCrossing signs Besides, could you tell me how to make a mergedAnnotations.csv that only contains stop and pedestrian signs?

DRosemei avatar Apr 17 '18 00:04 DRosemei

Ah, If those files are the same then this causes data_prep_400x260.p to have over 514M in size. You could create CSV file to have only the desired signs by deleting manually in Excel (sorting its column tag first then delete the rows) although the provided python file (the one that comes from the dataset) can do that. Yes, I mistype the filename :)

sudonto avatar Apr 17 '18 01:04 sudonto

@sudonto Thanks. Could you tell me the result after running create_pickle.py? I got data_raw_400x260.p (921.7KB) and resized_images_400x260(2,600 items, totalling 139.0 MB). So I can check whether to recreate CSV file.
I also find that there are stop or pedestrain signs in other pictures like keepRight, but this may not be so important. The problem that puzzles me a lot is that I can't get good results by using the restored model the author gave as I mentioned above.

DRosemei avatar Apr 17 '18 02:04 DRosemei

I will re-run the project again. Will tell you the result later. So strange that the original model cannot predict the signs accurately.

sudonto avatar Apr 17 '18 03:04 sudonto

@sudonto I have found the solution. Because I use python 2.7, so "/" is different from "/" in python 3.5. Thanks for your help and I'm waiting for you results :)

DRosemei avatar Apr 17 '18 06:04 DRosemei

The dependencies of the project https://github.com/georgesung/ssd_tensorflow_traffic_sign_detection#dependencies clearly state python3.5+.

YashBansod avatar Apr 17 '18 06:04 YashBansod

@sudonto did you train the whole data set or extend this code to more traffic sign classes? If you did, please tell me about the results. Thx!

Jasonsun1993 avatar Apr 17 '18 06:04 Jasonsun1993

@YashBansod Thanks. I have noticed that before, so I make some changes to the code.Now I am going to install python 3.5 to solve the problem.

DRosemei avatar Apr 17 '18 07:04 DRosemei

@sudonto My data_prep_400x260.p is 514M now. Thanks for your help. :)

DRosemei avatar Apr 17 '18 10:04 DRosemei

@DRosemei How do you make the data_prep_400x260.p being 514M? I followed #29 , but failed.

youthM avatar Jun 12 '19 02:06 youthM

@youthM I don’t know where exactly you fail. First, you should make sure your environment is the same as the author’s. I guess you may have trouble in “create_pickle.py”, you may find answers in #21

DRosemei avatar Jun 17 '19 00:06 DRosemei