pytorch-cpn icon indicating copy to clipboard operation
pytorch-cpn copied to clipboard

Training with other configurations.

Open mkocabas opened this issue 7 years ago • 64 comments

Hi @GengDavid,

Thanks for the great implementation. I'm eager collaborate with you to test other configurations. I have 2 x 1080 and 2 x 1080ti. I can borrow more if needed. Looking forward to your response!

mkocabas avatar Jul 15 '18 09:07 mkocabas

Hi @mkocabas ,

Thanks for your interest in my implementation. There may be at least two configurations to be tested, ResNet-50+384x288 and ResNet-101+384x288. Which one do you prefer to test? Or do you want to test both of them?

I've modified the codes a little, so please clone/pull the latest version before you run it. Please follow README to configure the environment.

You can train a ResNet-50+384x288 model directly in 384.288.model dir. by running train.py You may need to modify batch size in config.py, and use -g to specify the number of GPU you use. For example, you may set batch_size = 12 and run python3 train.py -g 2 when you use 2 x 1080 gpu to train the model.

To train a ResNet-101+384x288 model, you need to set model='CPN101' in config.py, and then follow the same way to train the model.

If you have any questions, feel free to contact me. You can also mail me at [email protected] or [email protected].

GengDavid avatar Jul 15 '18 11:07 GengDavid

Cool, so I can start with ResNet-50+384x288. After that I can try ResNet-101.

I'll use 2 x 1080ti with the default hyperparameters as in config. Am I correct?

mkocabas avatar Jul 15 '18 11:07 mkocabas

@GengDavid we have a little problem. 1080tis have 11GB memory. batch_size=6 barely fits the memory. This means that we can train with batch_size=12 using 2 gpus. What do you think?

mkocabas avatar Jul 15 '18 12:07 mkocabas

If you are using 1080tis, I think you can set batch_size more than 12 with 2 gpus while running ResNet-50+384x288 model.

GengDavid avatar Jul 15 '18 12:07 GengDavid

@mkocabas ResNet-50+384x288 model with batch_size=12 takes about 8G memory in my experiment.

GengDavid avatar Jul 15 '18 12:07 GengDavid

I'm consistently getting OOM error, but let me check. I'll restart the computer, maybe there are some blocking processes. I'll inform you about the progress.

mkocabas avatar Jul 15 '18 12:07 mkocabas

@GengDavid, restarting solved the problem. Thanks for pointing out! I'll update this issue as training continues.

How many epochs did you train the 256x192 model?

mkocabas avatar Jul 15 '18 13:07 mkocabas

@mkocabas About 25 epoch. I don't remember the exact figure.

GengDavid avatar Jul 15 '18 13:07 GengDavid

I see, so probably it'll take 4 days to converge.

mkocabas avatar Jul 15 '18 13:07 mkocabas

Fine, thanks.

GengDavid avatar Jul 15 '18 13:07 GengDavid

Epoch 6 (tested with GT bboxes)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.688
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.894
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.750
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.654
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.742
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.719
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.904
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.776
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.681
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.777

Epoch 13 (tested with GT bboxes)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.726
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.914
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.785
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.690
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.781
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.754
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.924
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.810
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.716
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.812

mkocabas avatar Jul 16 '18 19:07 mkocabas

@GengDavid do you have the weights of 5th epoch of ResNet50-256x192 model?

mkocabas avatar Jul 18 '18 06:07 mkocabas

Yes, I do have saved the 5th epoch pre-trained model. But I'm sorry to tell you that there's something different from the original paper in my code just as @Tiamo666 mentioned in issue #4.
The results seem very close, but I'm still going to modify the network and then re-test it.

GengDavid avatar Jul 18 '18 13:07 GengDavid

Yeah I saw the discussion. Please let me know about the results after modification. If you don't have enough GPUs, I can test the corrected model.

mkocabas avatar Jul 18 '18 13:07 mkocabas

I'll let you know the results but it may take a little long time since I only have 1*1080 free to run the code. May be you can test test the ResNet-50+384x288 model first.
Thanks!

GengDavid avatar Jul 18 '18 15:07 GengDavid

I've started to train fixed ResNet-50+384x288 on a Titan V w batch-size=24

mkocabas avatar Jul 18 '18 17:07 mkocabas

Hi, @mkocabas I've updated the ResNet-50+256*192 results. Have got some results? Thx.

GengDavid avatar Jul 26 '18 01:07 GengDavid

Hi, David, I've trained with the ResNet-50+384*288 with ground truth bboxes. The test result of 32 epoch is as follows: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.737 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.915 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.806 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.706 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.792 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.767 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.929 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.826 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.729 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.824

Due to the limit of network, I can not download the person detections results on COCO successfully, So I just use the ground truth.

Tiamo666 avatar Aug 13 '18 02:08 Tiamo666

@Tiamo666 Great job! Can you provide the pre-trained model so that I can test it with detection results? I think you can open a PR with the a link on it to download pre-trained model.

GengDavid avatar Aug 14 '18 07:08 GengDavid

@Tiamo666 Or if you do not want to open a RP, could you just provide a link to download the model? Google Drive, Onedrive, Dropbox and Baidu Yun are all fine.

GengDavid avatar Aug 14 '18 08:08 GengDavid

OK,I guess Baidu yun is a good choice. I will try to share the pretrained model on it and provide you the link as soon as I uploaded model

Tiamo666 avatar Aug 14 '18 10:08 Tiamo666

hi,David, I've already uploaded the model on BaiduYun. Here is the link: https://pan.baidu.com/s/1fdy5_0HQm63QtlOzxKbpuw

Tiamo666 avatar Aug 15 '18 02:08 Tiamo666

Great! I'll test it and update the result later.

GengDavid avatar Aug 15 '18 05:08 GengDavid

@Tiamo666 I've updated the results.

GengDavid avatar Aug 15 '18 06:08 GengDavid

That's cool! I'll have time to train with Resnet101+384*288, I'll share the model after finishing training

Tiamo666 avatar Aug 27 '18 06:08 Tiamo666

@Tiamo666 That's great! If you have any problem, feel free to contact me.

GengDavid avatar Aug 27 '18 07:08 GengDavid

Hi, David. I've uploaded the model of cpn384*288 with Resnet101 on Baidu Yun. Here is the link: https://pan.baidu.com/s/1toikUHSqHhHP3DkIOkNctA

Tiamo666 avatar Sep 06 '18 02:09 Tiamo666

@Tiamo666 Great! Thanks a lot. I'll update the results soon.

GengDavid avatar Sep 06 '18 08:09 GengDavid

Hello, David, I've just found that I trained with the old code which has "Color Normalized bug" last week. I feel sorry for that, I could retrain the model this week.

Tiamo666 avatar Sep 10 '18 06:09 Tiamo666

@Tiamo666 Retraining it is a better choice but may cost more time. I think we can just fine-tune the trained model. This may influence the result a little but can save time. However, I currently do not have free GPUs to do this work. What do you think about that?

GengDavid avatar Sep 10 '18 07:09 GengDavid