Provide useful debug information on tensorboard

mAP scalars

Images

Distributions

Histograms

Aug 06 '18 14:08 KUASWoodyLIN

Can't you help me to see what is wrong with my code ? THanks!

Sep 22 '18 06:09 chenyuqing

Hi @chenyuqing

I have try train_v2.py, but it look fine. maybe you should check keras backend configuration file link, I thank your "image_data_format" is not correct. and my setting is

{
    "epsilon": 1e-07,
    "image_data_format": "channels_last",
    "floatx": "float32",
    "backend": "tensorflow"
}

make sure your settings are the same as mine.

Sep 25 '18 04:09 KUASWoodyLIN

Hi, it seems that this repo is inactive for a while... (more than a year :worried:) Would you consider to pass your changes to this fork https://github.com/Borda/keras-yolo3 ?

Aug 15 '19 09:08 Borda

Hi! I'm in trouble because I can't learn. Which part of train_v2.py can I change to run it?

Oct 28 '20 02:10 shocora

Hi! I'm in trouble because I can't learn. Which part of train_v2.py can I change to run it?

It seems that the versions of python, tensorflow and Keras are important.

You can find the following description in the repository https://github.com/qqwweee/keras-yolo3

Python 3.5.2 Keras 2.1.5 tensorflow 1.6.0

I have also verified that it works with the following environments

Python 3.6 Keras 2.2.4 tensorflow 1.14.0

Oct 28 '20 03:10 tfukumori

thank you for reply. I matched the version but it doesn't work. Is there any place to change the PATH other than lines 34,35 and 41,42 of train_v2.py?Also, is it okay if LOGS_PATH is empty at the time of the first learning?

Oct 29 '20 14:10 shocora

When I first training it, it didn't matter if the LOGS_PATH folder (the default is yolo_logs) was empty.

I run the following command

python train_v2.py --yolo_train_file 2007_train.txt --yolo_val_file nano

2007_train.txt was created using voc_annotation.py, as described at https://github.com/qqwweee/keras-yolo3

To check the training results.

I run the following command

conda install tensorboard -y
tensorboard --logdir=<yolo_logs' full path> --host 0.0.0.0

In web browser, go to http://localhost:6006/

Oct 29 '20 23:10 tfukumori

I'm sorry to reply late.When I wrote the above command as it is, I got the following error. Also,What does nano specify? "File "train_v2.py", line 89, in init images_choose = [self.val_images[i] for i in np.random.randint(0, len(self.val_images), 50)] AttributeError: 'Yolo' object has no attribute 'val_images'"

Nov 10 '20 12:11 shocora

@shocora

"nano" specify

Specify "nano" if you do not specify a "verification" file or if it does not exist.

As you can see around the following lines of code, "train_v2.py" trains "training" and "validation" in a 9:1 ratio, regardless of the "validation" file is specified.

https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L439

Why the error occurred

I think this is because the following lines of code were not executed because the folder for the temporary files was left undeleted in the event of an abnormal exit, for example.

https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L82

Procedure before executing the command

Before executing the command, you must delete the temporary folder and move the resulting folder.

If a working folder (tmp_*) remains due to interruption, delete it.
If the results folder (yolo_logs) remains, delete, rename or move it.

Nov 10 '20 22:11 tfukumori

@tfukumori Thank you I am able to start learning. However, it stopped with the following error.

""" 549 [Yolo loss: 36.249851] Testing ... [Yolo testing loss: 38.217436981201175] Evaluate mAP 2020-11-11 11:59:25.702287: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure 2020-11-11 11:59:25.702377: E tensorflow/stream_executor/cuda/cuda_driver.cc:1032] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure :: 2020-11-11 11:59:25.702561: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: """

Also,How can I adjust the value on the horizontal axis of training loss?

Nov 11 '20 03:11 shocora

"CUDA" error.

I'm not sure about the "CUDA" error.

From the error message, it is possible that the GPU is not powerful enough, but I'm not sure.

If it's due to a lack of GPU performance, then running it on the CPU or reducing the number of batches might solve the problem. (It's a trade-off for performance.)

https://jp.mathworks.com/matlabcentral/answers/427234-what-is-the-cause-of-cuda_error_launch_failed

Adjust the value on the horizontal axis of training loss

If you mean to change the settings of the graph, I don't know.

If you mean the number of epochs, then it seems to vary with the number of images and batches.

https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L149 https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L199

Nov 11 '20 03:11 tfukumori

@tfukumori I was able to finish learning in 3 days. Thanks.

Why is "tmp_pred_files" empty before and after learning?

Also, When running yolo.py in full HD, is it better to change the following numbers? https://github.com/qqwweee/keras-yolo3/blob/e6598d13c703029b2686bc2eb8d5c09badf42992/yolo.py#L28

Nov 16 '20 15:11 shocora

I think mAP is usually between 0 and 1, but I get a value greater than or equal to 1. I would appreciate it if you could tell me the cause. キャプチャ

Nov 30 '20 11:11 shocora

I think mAP is usually between 0 and 1, but I get a value greater than or equal to 1. I would appreciate it if you could tell me the cause.

Maybe that's because of the 100-fold, as you can see below.

``I don't know. mAP * 100


https://github.com/qqwweee/keras-yolo3/blob/f4a9c40f4615cdbb774942507ecad3af5f05c990/train_v2.py#L419

Dec 08 '20 08:12 tfukumori

Is it this number as a result of multiplying by 100? Also, What is the standard for the mAP calculation method used here?

Dec 08 '20 14:12 shocora

Is it this number as a result of multiplying by 100? Also, What is the standard for the mAP calculation method used here?

I think this will be helpful.

You can find it here: https://qiita.com/mdo4nt6n/items/08e11426e2fac8433fed

Dec 08 '20 14:12 tfukumori

keras-yolo3
keras-yolo3 copied to clipboard

Training (add tensorboard debug, and mAP Calculation)

"nano" specify

Why the error occurred

Procedure before executing the command

"CUDA" error.

Adjust the value on the horizontal axis of training loss

keras-yolo3 keras-yolo3 copied to clipboard

Training (add tensorboard debug, and mAP Calculation)

"nano" specify

Why the error occurred

Procedure before executing the command

"CUDA" error.

Adjust the value on the horizontal axis of training loss

keras-yolo3
keras-yolo3 copied to clipboard