mmpose
mmpose copied to clipboard
[Feature] Support ViTPose
Motivation
Merge the ViTPose variants code and pre-trained models into mmpose.
Modification
- Add a vit backbone model in mmpose/models/backbones. The
__init__
file is modified accordingly. - Add the config files and corresponding markdown files in the configs folder.
- Fix a bug in the registration of layer-wise learning rate decay.
- Add a 'resize_upsample4' input transformation in the mmpose/models/heads/topdown_heatmap_simple_head.py file to support the simple decoder in ViTPose. It has no influence on other models.
BC-breaking (Optional)
No.
Use cases (Optional)
Checklist
Before PR:
- [X] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
- [X] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
- [X] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
- [X] New functionalities are covered by complete unit tests. If not, please add more unit tests to ensure correctness.
- [X] The documentation has been modified accordingly, including docstring or example tutorials.
After PR:
- [x] CLA has been signed and all committers have signed the CLA in this PR.
Thank you very much for your help! For now, there are lint issues in the code. Could you please install pre-commit hooks (see our docs) and run pre-commit run --all-files
in your local repo? The lint issues will be fixed automatically.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
dev-0.x@fd98b11
). Click here to learn what that means. Patch has no changes to coverable lines.
:exclamation: Current head 22fbc7b differs from pull request most recent head 52ee52b. Consider uploading reports for the commit 52ee52b to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## dev-0.x #1937 +/- ##
==========================================
Coverage ? 84.10%
==========================================
Files ? 242
Lines ? 21227
Branches ? 3652
==========================================
Hits ? 17853
Misses ? 2450
Partials ? 924
Flag | Coverage Δ | |
---|---|---|
unittests | 84.01% <0.00%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Thanks for your instruction. The code is checked now. How can I upload the pre-trained weights and logs for the ViTPose variants? Can I provide the links using OneDrive or Google Drive?
Thanks. Both OneDrive and Google Drive are welcome.
BTW, would you mind adding some unit-tests? An example could be found https://github.com/open-mmlab/mmpose/pull/1907/files#diff-dadc2075341a40335f28131ceaf3d0d415e5c316c54bb6b0a0741aeb002db24e
The unit test of ViTPose seems failed. For quick debugging, you can run unit tests locally by $pytest tests/
.
Thanks a lot for your help! The files and configs have been updated now. The pre-trained models and logs are available at Onedrive. The uncovered codes by the unit test are mostly existing in the init weight using the pre-trained models part (for example, changing the tensor name between the MAE pre-trained models and the backbones). We are wondering how can we cover these parts in the unit test. Can we produce pseudo checkpoints via torch.save
in the unit test to cover the rename parts? We have tested this parts via re-training the models for several epochs and find them work well.
We also fixed some bugs caused by the updated NumPy version in the dataset files. Please check the recent commits. By the way, it seems that the current failed build is cased by the HTTP error.
Hi @ly015, would you mind restarting the failed checks? I just checked the logs, and it seems that the pip installation caused the error. Thanks a lot.
It seems that the current failure information is related to the docker version... Should I open a new PR for the dev-1.x branch and close this PR instead? Thanks a lot for your patience.
Hi @ly015, it seems that the current failure case is in loading the video in test_inference.py
, where no frames are detected after the command.
To this end, is there anything we can do to aid the merging?
Thanks a lot.
We will help check and fix the CI problem.
Hi, is there anything we can do to help fix the CI problem? Besides, could we open a new PR based on the dev-1.x branch to merge the ViTPose variants into mmpose? Thanks for your response.
I have trained these models using your code and downloaded pretrained backbones. However, the results for some models mismatch with your record.
With classic decoder
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 |
---|---|---|---|---|---|---|
ViTPose-S | 256x192 | 0.737 | 0.905 | 0.813 | 0.790 | 0.943 |
ViTPose-B | 256x192 | 0.751 | 0.905 | 0.823 | 0.803 | 0.944 |
ViTPose-L | 256x192 | 0.777 | 0.915 | 0.850 | 0.828 | 0.953 |
ViTPose-H | 256x192 | 0.785 | 0.914 | 0.853 | 0.835 | 0.951 |
With simple decoder
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 |
---|---|---|---|---|---|---|
ViTPose-S | 256x192 | 0.736 | 0.904 | 0.812 | 0.790 | 0.942 |
ViTPose-B | 256x192 | 0.750 | 0.905 | 0.827 | 0.805 | 0.944 |
ViTPose-L | 256x192 | 0.774 | 0.911 | 0.847 | 0.826 | 0.950 |
ViTPose-H | 256x192 | 0.785 | 0.915 | 0.854 | 0.835 | 0.952 |
The validation accuracy is the same for all models.
I also noticed that the highest accuracy for large and huge models is at around 80th epoch. Maybe there is a problem with the optimizer? Have you validated the training process on this PR?
Hi We have re-trained the models and figured out that the performance drop is caused by the difference between the transformer layers implemented by mmcv and timm. Is it possible for us to use timm for the backbone implementation?
Yes, you can use timm for the backbone implementation. There is a tutorial in MMDetection on how to use timm backbones in MMDetection through an MMClassification wrapper, which should also be applicable for MMPose: https://mmdetection.readthedocs.io/en/latest/tutorials/how_to.html#use-backbone-network-in-timm-through-mmclassification
The above tutorial is just for your reference. You can use any approach to integrate timm backbones in your implementation.
Hi there,
Thanks for your patience. We have uploaded a timm version of the ViTPose.
The training logs are available here. vitpose_base.log vitpose_simple_base.log vitpose_small.log vitpose_simple_small.log
Hi there,
Are there any things we could help to aid the merge of the PR? We are willing to provide more information.
Best,