mmpose [Feature] Support ViTPose

Motivation

Merge the ViTPose variants code and pre-trained models into mmpose.

Modification

Add a vit backbone model in mmpose/models/backbones. The __init__ file is modified accordingly.
Add the config files and corresponding markdown files in the configs folder.
Fix a bug in the registration of layer-wise learning rate decay.
Add a 'resize_upsample4' input transformation in the mmpose/models/heads/topdown_heatmap_simple_head.py file to support the simple decoder in ViTPose. It has no influence on other models.

BC-breaking (Optional)

No.

Use cases (Optional)

Checklist

Before PR:

[X] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
[X] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
[X] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
[X] New functionalities are covered by complete unit tests. If not, please add more unit tests to ensure correctness.
[X] The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

[x] CLA has been signed and all committers have signed the CLA in this PR.

Jan 16 '23 09:01 Annbless

Thank you very much for your help! For now, there are lint issues in the code. Could you please install pre-commit hooks (see our docs) and run pre-commit run --all-files in your local repo? The lint issues will be fixed automatically.

Jan 16 '23 09:01 ly015

Codecov Report

:exclamation: No coverage uploaded for pull request base (dev-0.x@fd98b11). Click here to learn what that means. Patch has no changes to coverable lines.

:exclamation: Current head 22fbc7b differs from pull request most recent head 52ee52b. Consider uploading reports for the commit 52ee52b to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             dev-0.x    #1937   +/-   ##
==========================================
  Coverage           ?   84.10%           
==========================================
  Files              ?      242           
  Lines              ?    21227           
  Branches           ?     3652           
==========================================
  Hits               ?    17853           
  Misses             ?     2450           
  Partials           ?      924

Flag	Coverage Δ
unittests	`84.01% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Jan 16 '23 09:01 codecov[bot]

Thanks for your instruction. The code is checked now. How can I upload the pre-trained weights and logs for the ViTPose variants? Can I provide the links using OneDrive or Google Drive?

Jan 16 '23 09:01 Annbless

Thanks. Both OneDrive and Google Drive are welcome.

BTW, would you mind adding some unit-tests? An example could be found https://github.com/open-mmlab/mmpose/pull/1907/files#diff-dadc2075341a40335f28131ceaf3d0d415e5c316c54bb6b0a0741aeb002db24e

Jan 16 '23 09:01 jin-s13

The unit test of ViTPose seems failed. For quick debugging, you can run unit tests locally by $pytest tests/.

Jan 17 '23 02:01 ly015

Thanks a lot for your help! The files and configs have been updated now. The pre-trained models and logs are available at Onedrive. The uncovered codes by the unit test are mostly existing in the init weight using the pre-trained models part (for example, changing the tensor name between the MAE pre-trained models and the backbones). We are wondering how can we cover these parts in the unit test. Can we produce pseudo checkpoints via torch.save in the unit test to cover the rename parts? We have tested this parts via re-training the models for several epochs and find them work well.

Jan 18 '23 07:01 Annbless

We also fixed some bugs caused by the updated NumPy version in the dataset files. Please check the recent commits. By the way, it seems that the current failed build is cased by the HTTP error.

Jan 19 '23 15:01 Annbless

Hi @ly015, would you mind restarting the failed checks? I just checked the logs, and it seems that the pip installation caused the error. Thanks a lot.

Feb 08 '23 01:02 Annbless

It seems that the current failure information is related to the docker version... Should I open a new PR for the dev-1.x branch and close this PR instead? Thanks a lot for your patience.

Feb 08 '23 03:02 Annbless

Hi @ly015, it seems that the current failure case is in loading the video in test_inference.py, where no frames are detected after the command. To this end, is there anything we can do to aid the merging?

Thanks a lot.

Feb 09 '23 01:02 Annbless

We will help check and fix the CI problem.

Feb 09 '23 02:02 ly015

Hi, is there anything we can do to help fix the CI problem? Besides, could we open a new PR based on the dev-1.x branch to merge the ViTPose variants into mmpose? Thanks for your response.

Feb 25 '23 05:02 Annbless

I have trained these models using your code and downloaded pretrained backbones. However, the results for some models mismatch with your record.

With classic decoder

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰
ViTPose-S	256x192	0.737	0.905	0.813	0.790	0.943
ViTPose-B	256x192	0.751	0.905	0.823	0.803	0.944
ViTPose-L	256x192	0.777	0.915	0.850	0.828	0.953
ViTPose-H	256x192	0.785	0.914	0.853	0.835	0.951

With simple decoder

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰
ViTPose-S	256x192	0.736	0.904	0.812	0.790	0.942
ViTPose-B	256x192	0.750	0.905	0.827	0.805	0.944
ViTPose-L	256x192	0.774	0.911	0.847	0.826	0.950
ViTPose-H	256x192	0.785	0.915	0.854	0.835	0.952

The validation accuracy is the same for all models.

I also noticed that the highest accuracy for large and huge models is at around 80th epoch. Maybe there is a problem with the optimizer? Have you validated the training process on this PR?

Mar 08 '23 05:03 LareinaM

Hi We have re-trained the models and figured out that the performance drop is caused by the difference between the transformer layers implemented by mmcv and timm. Is it possible for us to use timm for the backbone implementation?

Mar 20 '23 04:03 Annbless

Yes, you can use timm for the backbone implementation. There is a tutorial in MMDetection on how to use timm backbones in MMDetection through an MMClassification wrapper, which should also be applicable for MMPose: https://mmdetection.readthedocs.io/en/latest/tutorials/how_to.html#use-backbone-network-in-timm-through-mmclassification

The above tutorial is just for your reference. You can use any approach to integrate timm backbones in your implementation.

Mar 20 '23 05:03 ly015

Hi there,

Thanks for your patience. We have uploaded a timm version of the ViTPose.

The training logs are available here. vitpose_base.log vitpose_simple_base.log vitpose_small.log vitpose_simple_small.log

Mar 23 '23 02:03 Annbless

Hi there,

Are there any things we could help to aid the merge of the PR? We are willing to provide more information.

Best,

Apr 13 '23 05:04 Annbless

mmpose mmpose copied to clipboard

[Feature] Support ViTPose

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Codecov Report

mmpose
mmpose copied to clipboard