pytorch-cpn Training with other configurations.

Hi @GengDavid,

Thanks for the great implementation. I'm eager collaborate with you to test other configurations. I have 2 x 1080 and 2 x 1080ti. I can borrow more if needed. Looking forward to your response!

Jul 15 '18 09:07 mkocabas

Hi @mkocabas ,

Thanks for your interest in my implementation. There may be at least two configurations to be tested, ResNet-50+384x288 and ResNet-101+384x288. Which one do you prefer to test? Or do you want to test both of them?

I've modified the codes a little, so please clone/pull the latest version before you run it. Please follow README to configure the environment.

You can train a ResNet-50+384x288 model directly in 384.288.model dir. by running train.py You may need to modify batch size in config.py, and use -g to specify the number of GPU you use. For example, you may set batch_size = 12 and run python3 train.py -g 2 when you use 2 x 1080 gpu to train the model.

To train a ResNet-101+384x288 model, you need to set model='CPN101' in config.py, and then follow the same way to train the model.

If you have any questions, feel free to contact me. You can also mail me at [email protected] or [email protected].

Jul 15 '18 11:07 GengDavid

Cool, so I can start with ResNet-50+384x288. After that I can try ResNet-101.

I'll use 2 x 1080ti with the default hyperparameters as in config. Am I correct?

Jul 15 '18 11:07 mkocabas

@GengDavid we have a little problem. 1080tis have 11GB memory. batch_size=6 barely fits the memory. This means that we can train with batch_size=12 using 2 gpus. What do you think?

Jul 15 '18 12:07 mkocabas

If you are using 1080tis, I think you can set batch_size more than 12 with 2 gpus while running ResNet-50+384x288 model.

Jul 15 '18 12:07 GengDavid

@mkocabas ResNet-50+384x288 model with batch_size=12 takes about 8G memory in my experiment.

Jul 15 '18 12:07 GengDavid

I'm consistently getting OOM error, but let me check. I'll restart the computer, maybe there are some blocking processes. I'll inform you about the progress.

Jul 15 '18 12:07 mkocabas

@GengDavid, restarting solved the problem. Thanks for pointing out! I'll update this issue as training continues.

How many epochs did you train the 256x192 model?

Jul 15 '18 13:07 mkocabas

@mkocabas About 25 epoch. I don't remember the exact figure.

Jul 15 '18 13:07 GengDavid

I see, so probably it'll take 4 days to converge.

Jul 15 '18 13:07 mkocabas

Fine, thanks.

Jul 15 '18 13:07 GengDavid

Epoch 6 (tested with GT bboxes)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.688
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.894
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.750
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.654
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.742
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.719
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.904
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.776
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.681
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.777

Epoch 13 (tested with GT bboxes)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.726
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.914
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.785
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.690
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.781
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.754
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.924
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.810
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.716
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.812

Jul 16 '18 19:07 mkocabas

@GengDavid do you have the weights of 5th epoch of ResNet50-256x192 model?

Jul 18 '18 06:07 mkocabas

Yes, I do have saved the 5th epoch pre-trained model. But I'm sorry to tell you that there's something different from the original paper in my code just as @Tiamo666 mentioned in issue #4.
The results seem very close, but I'm still going to modify the network and then re-test it.

Jul 18 '18 13:07 GengDavid

Yeah I saw the discussion. Please let me know about the results after modification. If you don't have enough GPUs, I can test the corrected model.

Jul 18 '18 13:07 mkocabas

I'll let you know the results but it may take a little long time since I only have 1*1080 free to run the code. May be you can test test the ResNet-50+384x288 model first.
Thanks!

Jul 18 '18 15:07 GengDavid

I've started to train fixed ResNet-50+384x288 on a Titan V w batch-size=24

Jul 18 '18 17:07 mkocabas

Hi, @mkocabas I've updated the ResNet-50+256*192 results. Have got some results? Thx.

Jul 26 '18 01:07 GengDavid

Due to the limit of network, I can not download the person detections results on COCO successfully, So I just use the ground truth.

Aug 13 '18 02:08 Tiamo666

@Tiamo666 Great job! Can you provide the pre-trained model so that I can test it with detection results? I think you can open a PR with the a link on it to download pre-trained model.

Aug 14 '18 07:08 GengDavid

@Tiamo666 Or if you do not want to open a RP, could you just provide a link to download the model? Google Drive, Onedrive, Dropbox and Baidu Yun are all fine.

Aug 14 '18 08:08 GengDavid

OK，I guess Baidu yun is a good choice. I will try to share the pretrained model on it and provide you the link as soon as I uploaded model

Aug 14 '18 10:08 Tiamo666

hi，David, I've already uploaded the model on BaiduYun. Here is the link: https://pan.baidu.com/s/1fdy5_0HQm63QtlOzxKbpuw

Aug 15 '18 02:08 Tiamo666

Great! I'll test it and update the result later.

Aug 15 '18 05:08 GengDavid

@Tiamo666 I've updated the results.

Aug 15 '18 06:08 GengDavid

That's cool! I'll have time to train with Resnet101+384*288, I'll share the model after finishing training

Aug 27 '18 06:08 Tiamo666

@Tiamo666 That's great! If you have any problem, feel free to contact me.

Aug 27 '18 07:08 GengDavid

Hi, David. I've uploaded the model of cpn384*288 with Resnet101 on Baidu Yun. Here is the link: https://pan.baidu.com/s/1toikUHSqHhHP3DkIOkNctA

Sep 06 '18 02:09 Tiamo666

@Tiamo666 Great! Thanks a lot. I'll update the results soon.

Sep 06 '18 08:09 GengDavid

Hello, David, I've just found that I trained with the old code which has "Color Normalized bug" last week. I feel sorry for that, I could retrain the model this week.

Sep 10 '18 06:09 Tiamo666

@Tiamo666 Retraining it is a better choice but may cost more time. I think we can just fine-tune the trained model. This may influence the result a little but can save time. However, I currently do not have free GPUs to do this work. What do you think about that?

Sep 10 '18 07:09 GengDavid

pytorch-cpn pytorch-cpn copied to clipboard

Training with other configurations.

pytorch-cpn
pytorch-cpn copied to clipboard