transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add ViTPose

Open NielsRogge opened this issue 10 months ago • 19 comments

What does this PR do?

This PR adds ViTPose as introduced in ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation.

Here's a demo notebook - note that the API might change: https://colab.research.google.com/drive/15_3gjcC0wtKSH85k76zewt81eUJIEWWA?usp=sharing.

To do:

  • [x] get rid of cv2 dependency (?)

NielsRogge avatar Apr 28 '24 20:04 NielsRogge

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@NielsRogge Hi Niels! Does this current PR works properly? (I want to do some test for this)

SangbumChoi avatar Apr 30 '24 01:04 SangbumChoi

Thanks for working on this @NielsRogge!

I can see there's a few bits unfinished e.g. tests. Is there a particular bit of code you'd like me to look at for a maintainers perspective?

For the issue description, there's a related model request is here: #24915

amyeroberts avatar May 08 '24 19:05 amyeroberts

@NielsRogge I'm unsubscribing atm, so that I don't get notifications on every new push. You just need to ping me again with my username when it's ready for review and I'll get notified

amyeroberts avatar May 14 '24 08:05 amyeroberts

It would be great to have a first round of review as the PR is in a ready state. @amyeroberts

NielsRogge avatar May 16 '24 06:05 NielsRogge

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 28 '24 08:06 github-actions[bot]

@amyeroberts this implementation requires preprocessing functions such as cv2.warpAffine and cv2.GaussianBlur which do not have a non-cv2 equivalent implementation. Shoud we proceed with adding a requires_cv2 dependency checker?

NielsRogge avatar Jun 28 '24 13:06 NielsRogge

which do not have a non-cv2 equivalent implementation

An affine transformation is just a linear one, so it should be easy for us to add a numpy equivalent. OpenCV have an example on how to add in numpy: https://theailearner.com/tag/cv2-warpaffine/

For gaussian blur, I think there's a scipy guassian filter we could use.

Another equivalent is just adding the fast image processor equivalent for the model and using torchvision.

We should try to avoid adding another vision library dependency as much as possible

amyeroberts avatar Jun 28 '24 19:06 amyeroberts

Inference_with_ViTPose_for_human_pose_estimation_ipynb.ipynb.zip

@amyeroberts Hi amy this is the overall descriptions easy for you to look in major change (Skipped explaining minor changes like typo.)

  1. https://github.com/huggingface/transformers/pull/30530/commits/1b439e20be2c07527ea89a54e4e0a749facad6be
  • Made the function to accept more variable
  1. https://github.com/huggingface/transformers/pull/30530/commits/bbd534c53a9afc335875d5a142372a3728303d7d
  • Add more test, originally the prepare_img was just for the cat image which is not appropriate for this model (Since it is human pose estimation). Changed to human image in COCO.
  • Also made the output return of post_process result as tensor.

IMO, we can some more testing function if it is needed.

SangbumChoi avatar Aug 03 '24 05:08 SangbumChoi

cc. @NielsRogge For additional review request

SangbumChoi avatar Aug 03 '24 08:08 SangbumChoi

@amyeroberts @NielsRogge Ping for the reminder

SangbumChoi avatar Aug 12 '24 12:08 SangbumChoi

@NielsRogge Requesting for second review (Need some answer for 2-3 questions)

SangbumChoi avatar Aug 19 '24 01:08 SangbumChoi

Looking forward to this @SangbumChoi thanks for doing this @amyeroberts you're a great code reviewer

danbochman avatar Sep 04 '24 09:09 danbochman

@amyeroberts @NielsRogge Fortunately, every comments from both review was clear enough me to understand and modify to appropriate way. (Also fixed ViT -> Vit) Asking for second final review :)

SangbumChoi avatar Sep 06 '24 07:09 SangbumChoi

current CI fail is unrelated for this PR

SangbumChoi avatar Sep 13 '24 00:09 SangbumChoi

@amyeroberts Hi amy thanks again for the detail review. I have addressed all the suggestions except for removing MoE. We can remove the MoE logic but it is designed in the original code. Do we really need to remove it? Otherwise it is good to go.

SangbumChoi avatar Sep 25 '24 12:09 SangbumChoi

cc. @qubvel @NielsRogge For to request PR review instead of amy

SangbumChoi avatar Sep 30 '24 08:09 SangbumChoi

@SangbumChoi sorry for the delay, I will review this week!

qubvel avatar Sep 30 '24 11:09 qubvel