transformers icon indicating copy to clipboard operation
transformers copied to clipboard

New model support RTDETR

Open SangbumChoi opened this issue 1 year ago • 18 comments

What does this PR do?

This is the new model for RTDETR that is complete version from https://github.com/huggingface/transformers/pull/27247

There are several TO DOs

  • [X] reslove conflicts
  • [ ] weight files for other 7 RTDETR
  • [X] Edit testing script
  • [ ] (optional) enable training

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [X] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [X] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@amyeroberts @NielsRogge

SangbumChoi avatar Feb 17 '24 08:02 SangbumChoi

Looking good @SangbumChoi! Let us know when the PR is ready for review 🤗

amyeroberts avatar Feb 19 '24 18:02 amyeroberts

@amyeroberts Hi amy! I think I can pass all the test in this week. Similar to other last PRs that I opened, I resolved conversation which is very simple or fixed as your comment. However, I did not resolved conversation which might need you to confirm or TO DOs.

SangbumChoi avatar Mar 03 '24 12:03 SangbumChoi

@amyeroberts Hi amy, Finally I have passed all the mandatory pass for this model. I think it is a good timing for to request 2nd review for the PR! Personally I think there are 3 things to check.

  1. deepcopy issue
  2. postprocessing logic in image test
  3. modeloutput, objectdetection output order.

Also is it possible to have difference local pytest and ci test ? Below is my test for local pytest ==================================== 55 passed, 52 skipped, 8 warnings in 39.38s =====================================

SangbumChoi avatar Mar 07 '24 14:03 SangbumChoi

========================================= 55 passed, 52 skipped, 8 warnings in 43.14s =========================================

However, intermediate commit or suggestion made slight difference in conversion.

SangbumChoi avatar Mar 15 '24 07:03 SangbumChoi

It is weird that local pytest pass the all pass but ci seems to be failed found out that error is related to https://github.com/huggingface/transformers/pull/26849/files

(Turns out not merge the latset branch was the problem?)

SangbumChoi avatar Mar 19 '24 02:03 SangbumChoi

@amyeroberts Finally, I have passed the all test. However, there are few comments that is still not resolved could you review it again?

SangbumChoi avatar Mar 19 '24 09:03 SangbumChoi

@amyeroberts Hi amy! Finally, I think I have modified all the changes. Thanks again Now it is time for real final review (You can see the unresolved comments also) :)

Here are the some future TO DOs that other contributors including myself for developing RT-DETR.

  1. Adding more models beside r-50 backbone.
  2. stablizing fine-tuning process (or might fix some bugs in fine-tuning)

BTW, I think we can close following PRs if we merge this in to main branch! https://github.com/huggingface/transformers/pull/27247 https://github.com/huggingface/transformers/issues/26742

SangbumChoi avatar Mar 27 '24 00:03 SangbumChoi

@amyeroberts ping again for reminder!

SangbumChoi avatar Apr 02 '24 12:04 SangbumChoi

@NielsRogge Could you do a first (final) review?

amyeroberts avatar Apr 09 '24 16:04 amyeroberts

@NielsRogge Thanks for the first final review. I have addressed all the suggestion! However, I did not change about the image processor part especially related to panoptic. It is true that RT-DETR does not use the panoptic annotation currently. However, IMO that removing all the panoptic related changes need to remove all the copied from other DETR which I thought it was hard to keep track the originality. Do you still want to remove the panoptic related parts?

SangbumChoi avatar Apr 12 '24 04:04 SangbumChoi

@NielsRogge Could you do second final review and share some thoughts about https://github.com/huggingface/transformers/pull/29077#issuecomment-2050962687 ?

SangbumChoi avatar Apr 17 '24 00:04 SangbumChoi

Sorry made a mistake, thought I assigned Amy for review but apparently I removed the request.

NielsRogge avatar May 09 '24 19:05 NielsRogge

@jasonkit Do you also have logits for original RTDETR for comparison?

SangbumChoi avatar May 10 '24 03:05 SangbumChoi

@SangbumChoi

Do you also have logits for original RTDETR for comparison?

Unfortunately no, I want to ask you for that actually.

For my conversion, I just skip the check (however the output look correct when compare to rtdetr_50vd 's result, just difference in the ordering)

I can share my Converted (NOT the original) model's output onrtdetr_r101vd here for your reference

logits:
tensor([[-4.6162, -4.9189, -4.6656],
        [-4.4701, -4.4997, -4.9658],
        [-5.6641, -7.9000, -5.0725]])
        
boxes:
tensor([[0.7707, 0.4124, 0.4585],
        [0.2589, 0.5492, 0.4735],
        [0.1688, 0.1993, 0.2108]])

jasonkit avatar May 10 '24 03:05 jasonkit

@jasonkit Let me check on that part after work!

SangbumChoi avatar May 10 '24 03:05 SangbumChoi

@jasonkit This is the original result which is same as the prediction :)

(tensor([[[-4.6162, -4.9189, -4.6656],
          [-4.4701, -4.4997, -4.9659],
          [-5.6641, -7.9000, -5.0725]]], device='cuda:0',
        grad_fn=<SliceBackward0>),
 tensor([[[0.7707, 0.4124, 0.4585],
          [0.2589, 0.5492, 0.4735],
          [0.1688, 0.1993, 0.2108]]], device='cuda:0', grad_fn=<SliceBackward0>))

rt_detr.ipynb.zip

SangbumChoi avatar May 10 '24 11:05 SangbumChoi

@jasonkit Thanks for the help, https://huggingface.co/sbchoi I uploaded two R101 network 👍🏼

SangbumChoi avatar May 10 '24 11:05 SangbumChoi

@amyeroberts Hi amy, jasonkit also helped to enhance this PR! FYI, fine-tuning on custom dataset quality has been more stabilized and also two other pretrained networks which are R101, R101-O365 has been added!

I think it is the final request for the reviewing PR.

SangbumChoi avatar May 10 '24 11:05 SangbumChoi

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@NielsRogge Hi niels!

Only thing left to do is update the checkpoints and have a final slow model tests run on these

Can you help me on this task?

SangbumChoi avatar May 17 '24 14:05 SangbumChoi

For someone who want to convert resnet18d and resnet34d

RTDETR has there own backbone configuration and the code https://github.com/lyuwenyu/RT-DETR/blob/2b88d5d53bcbfbb70329bc9c007fdf7e76cf90dc/rtdetr_paddle/ppdet/modeling/backbones/resnet.py#L541 We could successfully convert larger model (R50, R101) because it is exactly same. Unfortunately, original RTDETR with smaller backbone is not exactly same as the timm backbone pipeline, which requires additional downsample layer in the first layer. (This is because smaller model expansion is 1 otherwise it is 4). In order to make this compatible we need to change https://github.com/huggingface/pytorch-image-models/blob/84cb225ecb33da1a985b9d297c0a4439b6f59aac/timm/models/resnet.py#L326 following line to add downample layer when the inplanes is also same as planes * block_fn.expansion. FYI, https://github.com/lyuwenyu/RT-DETR/blob/2b88d5d53bcbfbb70329bc9c007fdf7e76cf90dc/rtdetr_paddle/ppdet/modeling/backbones/resnet.py#L406 RTDETR : shortcut = False if stage_idx == 0 else True, TIMM : shortcut = stage_idx != 0 or inplanes != planes * block_fn.expansion:

SangbumChoi avatar May 22 '24 05:05 SangbumChoi

@amyeroberts Could you rerun this CI? I have rebased to the main including https://github.com/huggingface/transformers/pull/30932/files

SangbumChoi avatar May 22 '24 07:05 SangbumChoi

@SangbumChoi Sure! There's still some difficult clashes with library versions we're experiencing on our CI, so failing on main. As soon as those are resolved, I'll rerun and then hopefully everything will be green so we can merge 🤞

amyeroberts avatar May 22 '24 10:05 amyeroberts

Pinging @rwightman here regarding the Resnet-D variant which would be required here.

TLDR, this new model (RT-DETR) leverages the AutoBackbone class (more specifically TimmBackbone) in the Transformers library to use ResNet-D models from timm. But as pointed out here, some checkpoints have a small modification for the ResNet backbone. Wondering if you're aware of this modification, whether this is supported in Timm (as I know there are various implementations of ResNet and its variants in Timm).

NielsRogge avatar May 22 '24 11:05 NielsRogge

This is for the information to @rwightman. In my personal understanding, I think paddle version resnet https://github.com/PaddlePaddle/PaddleDetection/blob/61cef7a7fe45c009fca76a13c47dcc516d535df9/ppdet/modeling/backbones/resnet.py#L406 is not exactly same to official resnet in the case of smaller model due to expansion_rate.

SangbumChoi avatar May 22 '24 11:05 SangbumChoi

@SangbumChoi @NielsRogge That Paddle ResNet supports a/b/c/d variants. Looks like their D should match timm D with one possible exception noticed on the first pass, they might be using deformable convs by default and I've avoided that, usually need a custom kernel for those to be fast. If DCN is disabled I think they'd match.

rwightman avatar May 22 '24 14:05 rwightman

@rwightman I think unfortunately RTDETR backbone with resnet34d and resnet18d is not compatible due to the upper mention :(

SangbumChoi avatar May 23 '24 13:05 SangbumChoi

@NielsRogge Since your comment https://github.com/huggingface/transformers/pull/29077#discussion_r1636075533 https://github.com/huggingface/transformers/pull/29077#discussion_r1636081513 Seems valid. There might be change in the config.json in all pretrained weight (I will re-upload and ping/tell you again!)

SangbumChoi avatar Jun 12 '24 09:06 SangbumChoi

@NielsRogge All done :)

SangbumChoi avatar Jun 12 '24 10:06 SangbumChoi

@amyeroberts Hi amy thanks for the comment. I just want to let you know that all the suggestion is done in commit.

FYI, modeling_rt_detr_resnet is from Resnet : modeling_resnet.py. The comment for can be also updated if there is no backward compatibility issue. Let me know if this is required for another PR!

@NielsRogge Thankfully, I think we don't need additional update for the config.json in model :)

SangbumChoi avatar Jun 14 '24 00:06 SangbumChoi