mmpose icon indicating copy to clipboard operation
mmpose copied to clipboard

[Feature] Support ViTPose

Open Annbless opened this issue 2 years ago • 7 comments

Motivation

Merge the ViTPose variants code and pre-trained models into mmpose.

Modification

  1. Add a vit backbone model in mmpose/models/backbones. The __init__ file is modified accordingly.
  2. Add the config files and corresponding markdown files in the configs folder.
  3. Fix a bug in the registration of layer-wise learning rate decay.
  4. Add a 'resize_upsample4' input transformation in the mmpose/models/heads/topdown_heatmap_simple_head.py file to support the simple decoder in ViTPose. It has no influence on other models.

BC-breaking (Optional)

No.

Use cases (Optional)

Checklist

Before PR:

  • [X] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
  • [X] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
  • [X] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
  • [X] New functionalities are covered by complete unit tests. If not, please add more unit tests to ensure correctness.
  • [X] The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

  • [x] CLA has been signed and all committers have signed the CLA in this PR.

Annbless avatar Jan 16 '23 09:01 Annbless

Thank you very much for your help! For now, there are lint issues in the code. Could you please install pre-commit hooks (see our docs) and run pre-commit run --all-files in your local repo? The lint issues will be fixed automatically.

ly015 avatar Jan 16 '23 09:01 ly015

Codecov Report

:exclamation: No coverage uploaded for pull request base (dev-0.x@fd98b11). Click here to learn what that means. Patch has no changes to coverable lines.

:exclamation: Current head 22fbc7b differs from pull request most recent head 52ee52b. Consider uploading reports for the commit 52ee52b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             dev-0.x    #1937   +/-   ##
==========================================
  Coverage           ?   84.10%           
==========================================
  Files              ?      242           
  Lines              ?    21227           
  Branches           ?     3652           
==========================================
  Hits               ?    17853           
  Misses             ?     2450           
  Partials           ?      924           
Flag Coverage Δ
unittests 84.01% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Jan 16 '23 09:01 codecov[bot]

Thanks for your instruction. The code is checked now. How can I upload the pre-trained weights and logs for the ViTPose variants? Can I provide the links using OneDrive or Google Drive?

Annbless avatar Jan 16 '23 09:01 Annbless

Thanks. Both OneDrive and Google Drive are welcome.

BTW, would you mind adding some unit-tests? An example could be found https://github.com/open-mmlab/mmpose/pull/1907/files#diff-dadc2075341a40335f28131ceaf3d0d415e5c316c54bb6b0a0741aeb002db24e

jin-s13 avatar Jan 16 '23 09:01 jin-s13

The unit test of ViTPose seems failed. For quick debugging, you can run unit tests locally by $pytest tests/.

ly015 avatar Jan 17 '23 02:01 ly015

Thanks a lot for your help! The files and configs have been updated now. The pre-trained models and logs are available at Onedrive. The uncovered codes by the unit test are mostly existing in the init weight using the pre-trained models part (for example, changing the tensor name between the MAE pre-trained models and the backbones). We are wondering how can we cover these parts in the unit test. Can we produce pseudo checkpoints via torch.save in the unit test to cover the rename parts? We have tested this parts via re-training the models for several epochs and find them work well.

Annbless avatar Jan 18 '23 07:01 Annbless

We also fixed some bugs caused by the updated NumPy version in the dataset files. Please check the recent commits. By the way, it seems that the current failed build is cased by the HTTP error.

Annbless avatar Jan 19 '23 15:01 Annbless

Hi @ly015, would you mind restarting the failed checks? I just checked the logs, and it seems that the pip installation caused the error. Thanks a lot.

Annbless avatar Feb 08 '23 01:02 Annbless

It seems that the current failure information is related to the docker version... Should I open a new PR for the dev-1.x branch and close this PR instead? Thanks a lot for your patience.

Annbless avatar Feb 08 '23 03:02 Annbless

Hi @ly015, it seems that the current failure case is in loading the video in test_inference.py, where no frames are detected after the command. To this end, is there anything we can do to aid the merging?

Thanks a lot.

Annbless avatar Feb 09 '23 01:02 Annbless

We will help check and fix the CI problem.

ly015 avatar Feb 09 '23 02:02 ly015

Hi, is there anything we can do to help fix the CI problem? Besides, could we open a new PR based on the dev-1.x branch to merge the ViTPose variants into mmpose? Thanks for your response.

Annbless avatar Feb 25 '23 05:02 Annbless

I have trained these models using your code and downloaded pretrained backbones. However, the results for some models mismatch with your record.

With classic decoder

Arch Input Size AP AP50 AP75 AR AR50
ViTPose-S 256x192 0.737 0.905 0.813 0.790 0.943
ViTPose-B 256x192 0.751 0.905 0.823 0.803 0.944
ViTPose-L 256x192 0.777 0.915 0.850 0.828 0.953
ViTPose-H 256x192 0.785 0.914 0.853 0.835 0.951

With simple decoder

Arch Input Size AP AP50 AP75 AR AR50
ViTPose-S 256x192 0.736 0.904 0.812 0.790 0.942
ViTPose-B 256x192 0.750 0.905 0.827 0.805 0.944
ViTPose-L 256x192 0.774 0.911 0.847 0.826 0.950
ViTPose-H 256x192 0.785 0.915 0.854 0.835 0.952

The validation accuracy is the same for all models.

I also noticed that the highest accuracy for large and huge models is at around 80th epoch. Maybe there is a problem with the optimizer? Have you validated the training process on this PR?

LareinaM avatar Mar 08 '23 05:03 LareinaM

Hi We have re-trained the models and figured out that the performance drop is caused by the difference between the transformer layers implemented by mmcv and timm. Is it possible for us to use timm for the backbone implementation?

Annbless avatar Mar 20 '23 04:03 Annbless

Yes, you can use timm for the backbone implementation. There is a tutorial in MMDetection on how to use timm backbones in MMDetection through an MMClassification wrapper, which should also be applicable for MMPose: https://mmdetection.readthedocs.io/en/latest/tutorials/how_to.html#use-backbone-network-in-timm-through-mmclassification

The above tutorial is just for your reference. You can use any approach to integrate timm backbones in your implementation.

ly015 avatar Mar 20 '23 05:03 ly015

Hi there,

Thanks for your patience. We have uploaded a timm version of the ViTPose.

The training logs are available here. vitpose_base.log vitpose_simple_base.log vitpose_small.log vitpose_simple_small.log

Annbless avatar Mar 23 '23 02:03 Annbless

Hi there,

Are there any things we could help to aid the merge of the PR? We are willing to provide more information.

Best,

Annbless avatar Apr 13 '23 05:04 Annbless