What does this PR do?

This PR adds ViTPose as introduced in ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation.

Here's a demo notebook - note that the API might change: https://colab.research.google.com/drive/15_3gjcC0wtKSH85k76zewt81eUJIEWWA?usp=sharing.

To do:

[x] get rid of cv2 dependency (?)

Apr 28 '24 20:04 NielsRogge

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Apr 29 '24 08:04 HuggingFaceDocBuilderDev

@NielsRogge Hi Niels! Does this current PR works properly? (I want to do some test for this)

Apr 30 '24 01:04 SangbumChoi

Thanks for working on this @NielsRogge!

I can see there's a few bits unfinished e.g. tests. Is there a particular bit of code you'd like me to look at for a maintainers perspective?

For the issue description, there's a related model request is here: #24915

May 08 '24 19:05 amyeroberts

@NielsRogge I'm unsubscribing atm, so that I don't get notifications on every new push. You just need to ping me again with my username when it's ready for review and I'll get notified

May 14 '24 08:05 amyeroberts

It would be great to have a first round of review as the PR is in a ready state. @amyeroberts

May 16 '24 06:05 NielsRogge

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 28 '24 08:06 github-actions[bot]

@amyeroberts this implementation requires preprocessing functions such as cv2.warpAffine and cv2.GaussianBlur which do not have a non-cv2 equivalent implementation. Shoud we proceed with adding a requires_cv2 dependency checker?

Jun 28 '24 13:06 NielsRogge

which do not have a non-cv2 equivalent implementation

An affine transformation is just a linear one, so it should be easy for us to add a numpy equivalent. OpenCV have an example on how to add in numpy: https://theailearner.com/tag/cv2-warpaffine/

For gaussian blur, I think there's a scipy guassian filter we could use.

Another equivalent is just adding the fast image processor equivalent for the model and using torchvision.

We should try to avoid adding another vision library dependency as much as possible

Jun 28 '24 19:06 amyeroberts

Inference_with_ViTPose_for_human_pose_estimation_ipynb.ipynb.zip

@amyeroberts Hi amy this is the overall descriptions easy for you to look in major change (Skipped explaining minor changes like typo.)

https://github.com/huggingface/transformers/pull/30530/commits/1b439e20be2c07527ea89a54e4e0a749facad6be

Made the function to accept more variable

https://github.com/huggingface/transformers/pull/30530/commits/bbd534c53a9afc335875d5a142372a3728303d7d

Add more test, originally the prepare_img was just for the cat image which is not appropriate for this model (Since it is human pose estimation). Changed to human image in COCO.
Also made the output return of post_process result as tensor.

IMO, we can some more testing function if it is needed.

Aug 03 '24 05:08 SangbumChoi

cc. @NielsRogge For additional review request

Aug 03 '24 08:08 SangbumChoi

@amyeroberts @NielsRogge Ping for the reminder

Aug 12 '24 12:08 SangbumChoi

@NielsRogge Requesting for second review (Need some answer for 2-3 questions)

Aug 19 '24 01:08 SangbumChoi

Looking forward to this @SangbumChoi thanks for doing this @amyeroberts you're a great code reviewer

Sep 04 '24 09:09 danbochman

@amyeroberts @NielsRogge Fortunately, every comments from both review was clear enough me to understand and modify to appropriate way. (Also fixed ViT -> Vit) Asking for second final review :)

Sep 06 '24 07:09 SangbumChoi

current CI fail is unrelated for this PR

Sep 13 '24 00:09 SangbumChoi

@amyeroberts Hi amy thanks again for the detail review. I have addressed all the suggestions except for removing MoE. We can remove the MoE logic but it is designed in the original code. Do we really need to remove it? Otherwise it is good to go.

Sep 25 '24 12:09 SangbumChoi

cc. @qubvel @NielsRogge For to request PR review instead of amy

Sep 30 '24 08:09 SangbumChoi

@SangbumChoi sorry for the delay, I will review this week!

Sep 30 '24 11:09 qubvel

transformers
transformers copied to clipboard

Add ViTPose

What does this PR do?

transformers transformers copied to clipboard

Add ViTPose

What does this PR do?

transformers
transformers copied to clipboard