keras-yolo3 icon indicating copy to clipboard operation
keras-yolo3 copied to clipboard

Training (add tensorboard debug, and mAP Calculation)

Open KUASWoodyLIN opened this issue 6 years ago • 16 comments

Provide useful debug information on tensorboard

mAP scalars image

Images image

Distributions image

Histograms image

KUASWoodyLIN avatar Aug 06 '18 14:08 KUASWoodyLIN

Can't you help me to see what is wrong with my code ? THanks!

chenyuqing avatar Sep 22 '18 06:09 chenyuqing

Hi @chenyuqing

I have try train_v2.py, but it look fine. maybe you should check keras backend configuration file link, I thank your "image_data_format" is not correct. and my setting is

{
    "epsilon": 1e-07,
    "image_data_format": "channels_last",
    "floatx": "float32",
    "backend": "tensorflow"
}

make sure your settings are the same as mine.

KUASWoodyLIN avatar Sep 25 '18 04:09 KUASWoodyLIN

Hi, it seems that this repo is inactive for a while... (more than a year :worried:) Would you consider to pass your changes to this fork https://github.com/Borda/keras-yolo3 ?

Borda avatar Aug 15 '19 09:08 Borda

Hi! I'm in trouble because I can't learn. Which part of train_v2.py can I change to run it?

shocora avatar Oct 28 '20 02:10 shocora

Hi! I'm in trouble because I can't learn. Which part of train_v2.py can I change to run it?

It seems that the versions of python, tensorflow and Keras are important.

You can find the following description in the repository https://github.com/qqwweee/keras-yolo3

Python 3.5.2 Keras 2.1.5 tensorflow 1.6.0

I have also verified that it works with the following environments

Python 3.6 Keras 2.2.4 tensorflow 1.14.0

tfukumori avatar Oct 28 '20 03:10 tfukumori

thank you for reply. I matched the version but it doesn't work. Is there any place to change the PATH other than lines 34,35 and 41,42 of train_v2.py?Also, is it okay if LOGS_PATH is empty at the time of the first learning?

shocora avatar Oct 29 '20 14:10 shocora

When I first training it, it didn't matter if the LOGS_PATH folder (the default is yolo_logs) was empty.

I run the following command

python train_v2.py --yolo_train_file 2007_train.txt --yolo_val_file nano

2007_train.txt was created using voc_annotation.py, as described at https://github.com/qqwweee/keras-yolo3

To check the training results.

I run the following command

conda install tensorboard -y
tensorboard --logdir=<yolo_logs' full path> --host 0.0.0.0

In web browser, go to http://localhost:6006/

tfukumori avatar Oct 29 '20 23:10 tfukumori

I'm sorry to reply late.When I wrote the above command as it is, I got the following error. Also,What does nano specify? "File "train_v2.py", line 89, in init images_choose = [self.val_images[i] for i in np.random.randint(0, len(self.val_images), 50)] AttributeError: 'Yolo' object has no attribute 'val_images'"

shocora avatar Nov 10 '20 12:11 shocora

@shocora

"nano" specify

Specify "nano" if you do not specify a "verification" file or if it does not exist.

As you can see around the following lines of code, "train_v2.py" trains "training" and "validation" in a 9:1 ratio, regardless of the "validation" file is specified.

https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L439

Why the error occurred

I think this is because the following lines of code were not executed because the folder for the temporary files was left undeleted in the event of an abnormal exit, for example.

https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L82

Procedure before executing the command

Before executing the command, you must delete the temporary folder and move the resulting folder.

  • If a working folder (tmp_*) remains due to interruption, delete it.
  • If the results folder (yolo_logs) remains, delete, rename or move it.

tfukumori avatar Nov 10 '20 22:11 tfukumori

@tfukumori Thank you I am able to start learning. However, it stopped with the following error.

""" 549 [Yolo loss: 36.249851] Testing ... [Yolo testing loss: 38.217436981201175] Evaluate mAP 2020-11-11 11:59:25.702287: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure 2020-11-11 11:59:25.702377: E tensorflow/stream_executor/cuda/cuda_driver.cc:1032] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure :: 2020-11-11 11:59:25.702561: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: """

Also,How can I adjust the value on the horizontal axis of training loss? image

shocora avatar Nov 11 '20 03:11 shocora

"CUDA" error.

I'm not sure about the "CUDA" error.

From the error message, it is possible that the GPU is not powerful enough, but I'm not sure.

If it's due to a lack of GPU performance, then running it on the CPU or reducing the number of batches might solve the problem. (It's a trade-off for performance.)

https://jp.mathworks.com/matlabcentral/answers/427234-what-is-the-cause-of-cuda_error_launch_failed

Adjust the value on the horizontal axis of training loss

If you mean to change the settings of the graph, I don't know.

If you mean the number of epochs, then it seems to vary with the number of images and batches.

https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L149 https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L199

tfukumori avatar Nov 11 '20 03:11 tfukumori

@tfukumori I was able to finish learning in 3 days. Thanks.

Why is "tmp_pred_files" empty before and after learning?

Also, When running yolo.py in full HD, is it better to change the following numbers? https://github.com/qqwweee/keras-yolo3/blob/e6598d13c703029b2686bc2eb8d5c09badf42992/yolo.py#L28

shocora avatar Nov 16 '20 15:11 shocora

I think mAP is usually between 0 and 1, but I get a value greater than or equal to 1. I would appreciate it if you could tell me the cause. キャプチャ

shocora avatar Nov 30 '20 11:11 shocora

I think mAP is usually between 0 and 1, but I get a value greater than or equal to 1. I would appreciate it if you could tell me the cause. キャプチャ

Maybe that's because of the 100-fold, as you can see below.

``I don't know. mAP * 100


https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L419

tfukumori avatar Dec 08 '20 08:12 tfukumori

Is it this number as a result of multiplying by 100? Also, What is the standard for the mAP calculation method used here?

shocora avatar Dec 08 '20 14:12 shocora

Is it this number as a result of multiplying by 100? Also, What is the standard for the mAP calculation method used here?

I think this will be helpful.

You can find it here: https://qiita.com/mdo4nt6n/items/08e11426e2fac8433fed

tfukumori avatar Dec 08 '20 14:12 tfukumori