keras-yolo3
keras-yolo3 copied to clipboard
Training (add tensorboard debug, and mAP Calculation)
Provide useful debug information on tensorboard
mAP scalars
Images
Distributions
Histograms
Can't you help me to see what is wrong with my code ? THanks!
Hi @chenyuqing
I have try train_v2.py, but it look fine. maybe you should check keras backend configuration file link, I thank your "image_data_format" is not correct. and my setting is
{
"epsilon": 1e-07,
"image_data_format": "channels_last",
"floatx": "float32",
"backend": "tensorflow"
}
make sure your settings are the same as mine.
Hi, it seems that this repo is inactive for a while... (more than a year :worried:) Would you consider to pass your changes to this fork https://github.com/Borda/keras-yolo3 ?
Hi! I'm in trouble because I can't learn. Which part of train_v2.py can I change to run it?
Hi! I'm in trouble because I can't learn. Which part of train_v2.py can I change to run it?
It seems that the versions of python, tensorflow and Keras are important.
You can find the following description in the repository https://github.com/qqwweee/keras-yolo3
Python 3.5.2 Keras 2.1.5 tensorflow 1.6.0
I have also verified that it works with the following environments
Python 3.6 Keras 2.2.4 tensorflow 1.14.0
thank you for reply. I matched the version but it doesn't work. Is there any place to change the PATH other than lines 34,35 and 41,42 of train_v2.py?Also, is it okay if LOGS_PATH is empty at the time of the first learning?
When I first training it, it didn't matter if the LOGS_PATH folder (the default is yolo_logs
) was empty.
I run the following command
python train_v2.py --yolo_train_file 2007_train.txt --yolo_val_file nano
2007_train.txt
was created using voc_annotation.py
, as described at https://github.com/qqwweee/keras-yolo3
To check the training results.
I run the following command
conda install tensorboard -y
tensorboard --logdir=<yolo_logs' full path> --host 0.0.0.0
In web browser, go to http://localhost:6006/
I'm sorry to reply late.When I wrote the above command as it is, I got the following error. Also,What does nano specify? "File "train_v2.py", line 89, in init images_choose = [self.val_images[i] for i in np.random.randint(0, len(self.val_images), 50)] AttributeError: 'Yolo' object has no attribute 'val_images'"
@shocora
"nano" specify
Specify "nano" if you do not specify a "verification" file or if it does not exist.
As you can see around the following lines of code, "train_v2.py" trains "training" and "validation" in a 9:1 ratio, regardless of the "validation" file is specified.
https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L439
Why the error occurred
I think this is because the following lines of code were not executed because the folder for the temporary files was left undeleted in the event of an abnormal exit, for example.
https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L82
Procedure before executing the command
Before executing the command, you must delete the temporary folder and move the resulting folder.
- If a working folder (tmp_*) remains due to interruption, delete it.
- If the results folder (yolo_logs) remains, delete, rename or move it.
@tfukumori Thank you I am able to start learning. However, it stopped with the following error.
""" 549 [Yolo loss: 36.249851] Testing ... [Yolo testing loss: 38.217436981201175] Evaluate mAP 2020-11-11 11:59:25.702287: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure 2020-11-11 11:59:25.702377: E tensorflow/stream_executor/cuda/cuda_driver.cc:1032] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure :: 2020-11-11 11:59:25.702561: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: """
Also,How can I adjust the value on the horizontal axis of training loss?
"CUDA" error.
I'm not sure about the "CUDA" error.
From the error message, it is possible that the GPU is not powerful enough, but I'm not sure.
If it's due to a lack of GPU performance, then running it on the CPU or reducing the number of batches might solve the problem. (It's a trade-off for performance.)
https://jp.mathworks.com/matlabcentral/answers/427234-what-is-the-cause-of-cuda_error_launch_failed
Adjust the value on the horizontal axis of training loss
If you mean to change the settings of the graph, I don't know.
If you mean the number of epochs, then it seems to vary with the number of images and batches.
https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L149 https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L199
@tfukumori I was able to finish learning in 3 days. Thanks.
Why is "tmp_pred_files" empty before and after learning?
Also, When running yolo.py in full HD, is it better to change the following numbers? https://github.com/qqwweee/keras-yolo3/blob/e6598d13c703029b2686bc2eb8d5c09badf42992/yolo.py#L28
I think mAP is usually between 0 and 1, but I get a value greater than or equal to 1.
I would appreciate it if you could tell me the cause.
I think mAP is usually between 0 and 1, but I get a value greater than or equal to 1. I would appreciate it if you could tell me the cause.
Maybe that's because of the 100-fold, as you can see below.
``I don't know. mAP * 100
https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L419
Is it this number as a result of multiplying by 100? Also, What is the standard for the mAP calculation method used here?
Is it this number as a result of multiplying by 100? Also, What is the standard for the mAP calculation method used here?
I think this will be helpful.
You can find it here: https://qiita.com/mdo4nt6n/items/08e11426e2fac8433fed