transformers
transformers copied to clipboard
New model support RTDETR
What does this PR do?
This is the new model for RTDETR that is complete version from https://github.com/huggingface/transformers/pull/27247
There are several TO DOs
- [X] reslove conflicts
- [ ] weight files for other 7 RTDETR
- [X] Edit testing script
- [ ] (optional) enable training
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [X] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [X] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@amyeroberts @NielsRogge
Looking good @SangbumChoi! Let us know when the PR is ready for review 🤗
@amyeroberts Hi amy! I think I can pass all the test in this week. Similar to other last PRs that I opened, I resolved conversation which is very simple or fixed as your comment. However, I did not resolved conversation which might need you to confirm or TO DOs.
@amyeroberts Hi amy, Finally I have passed all the mandatory pass for this model. I think it is a good timing for to request 2nd review for the PR! Personally I think there are 3 things to check.
- deepcopy issue
- postprocessing logic in image test
- modeloutput, objectdetection output order.
Also is it possible to have difference local pytest and ci test ? Below is my test for local pytest ==================================== 55 passed, 52 skipped, 8 warnings in 39.38s =====================================
========================================= 55 passed, 52 skipped, 8 warnings in 43.14s =========================================
However, intermediate commit or suggestion made slight difference in conversion.
It is weird that local pytest pass the all pass but ci seems to be failed found out that error is related to https://github.com/huggingface/transformers/pull/26849/files
(Turns out not merge the latset branch was the problem?)
@amyeroberts Finally, I have passed the all test. However, there are few comments that is still not resolved could you review it again?
@amyeroberts Hi amy! Finally, I think I have modified all the changes. Thanks again Now it is time for real final review (You can see the unresolved comments also) :)
Here are the some future TO DOs that other contributors including myself for developing RT-DETR.
- Adding more models beside r-50 backbone.
- stablizing fine-tuning process (or might fix some bugs in fine-tuning)
BTW, I think we can close following PRs if we merge this in to main branch! https://github.com/huggingface/transformers/pull/27247 https://github.com/huggingface/transformers/issues/26742
@amyeroberts ping again for reminder!
@NielsRogge Could you do a first (final) review?
@NielsRogge Thanks for the first final review. I have addressed all the suggestion! However, I did not change about the image processor part especially related to panoptic. It is true that RT-DETR does not use the panoptic annotation currently. However, IMO that removing all the panoptic related changes need to remove all the copied from other DETR which I thought it was hard to keep track the originality. Do you still want to remove the panoptic related parts?
@NielsRogge Could you do second final review and share some thoughts about https://github.com/huggingface/transformers/pull/29077#issuecomment-2050962687 ?
Sorry made a mistake, thought I assigned Amy for review but apparently I removed the request.
@jasonkit Do you also have logits for original RTDETR for comparison?
@SangbumChoi
Do you also have logits for original RTDETR for comparison?
Unfortunately no, I want to ask you for that actually.
For my conversion, I just skip the check (however the output look correct when compare to rtdetr_50vd 's result, just difference in the ordering)
I can share my Converted (NOT the original) model's output onrtdetr_r101vd here for your reference
logits:
tensor([[-4.6162, -4.9189, -4.6656],
[-4.4701, -4.4997, -4.9658],
[-5.6641, -7.9000, -5.0725]])
boxes:
tensor([[0.7707, 0.4124, 0.4585],
[0.2589, 0.5492, 0.4735],
[0.1688, 0.1993, 0.2108]])
@jasonkit Let me check on that part after work!
@jasonkit This is the original result which is same as the prediction :)
(tensor([[[-4.6162, -4.9189, -4.6656],
[-4.4701, -4.4997, -4.9659],
[-5.6641, -7.9000, -5.0725]]], device='cuda:0',
grad_fn=<SliceBackward0>),
tensor([[[0.7707, 0.4124, 0.4585],
[0.2589, 0.5492, 0.4735],
[0.1688, 0.1993, 0.2108]]], device='cuda:0', grad_fn=<SliceBackward0>))
@jasonkit Thanks for the help, https://huggingface.co/sbchoi I uploaded two R101 network 👍🏼
@amyeroberts Hi amy, jasonkit also helped to enhance this PR! FYI, fine-tuning on custom dataset quality has been more stabilized and also two other pretrained networks which are R101, R101-O365 has been added!
I think it is the final request for the reviewing PR.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
@NielsRogge Hi niels!
Only thing left to do is update the checkpoints and have a final slow model tests run on these
Can you help me on this task?
For someone who want to convert resnet18d and resnet34d
RTDETR has there own backbone configuration and the code https://github.com/lyuwenyu/RT-DETR/blob/2b88d5d53bcbfbb70329bc9c007fdf7e76cf90dc/rtdetr_paddle/ppdet/modeling/backbones/resnet.py#L541 We could successfully convert larger model (R50, R101) because it is exactly same. Unfortunately, original RTDETR with smaller backbone is not exactly same as the timm backbone pipeline, which requires additional downsample layer in the first layer. (This is because smaller model expansion is 1 otherwise it is 4). In order to make this compatible we need to change https://github.com/huggingface/pytorch-image-models/blob/84cb225ecb33da1a985b9d297c0a4439b6f59aac/timm/models/resnet.py#L326 following line to add downample layer when the inplanes is also same as planes * block_fn.expansion. FYI, https://github.com/lyuwenyu/RT-DETR/blob/2b88d5d53bcbfbb70329bc9c007fdf7e76cf90dc/rtdetr_paddle/ppdet/modeling/backbones/resnet.py#L406 RTDETR : shortcut = False if stage_idx == 0 else True, TIMM : shortcut = stage_idx != 0 or inplanes != planes * block_fn.expansion:
@amyeroberts Could you rerun this CI? I have rebased to the main including https://github.com/huggingface/transformers/pull/30932/files
@SangbumChoi Sure! There's still some difficult clashes with library versions we're experiencing on our CI, so failing on main. As soon as those are resolved, I'll rerun and then hopefully everything will be green so we can merge 🤞
Pinging @rwightman here regarding the Resnet-D variant which would be required here.
TLDR, this new model (RT-DETR) leverages the AutoBackbone class (more specifically TimmBackbone) in the Transformers library to use ResNet-D models from timm. But as pointed out here, some checkpoints have a small modification for the ResNet backbone. Wondering if you're aware of this modification, whether this is supported in Timm (as I know there are various implementations of ResNet and its variants in Timm).
This is for the information to @rwightman. In my personal understanding, I think paddle version resnet https://github.com/PaddlePaddle/PaddleDetection/blob/61cef7a7fe45c009fca76a13c47dcc516d535df9/ppdet/modeling/backbones/resnet.py#L406 is not exactly same to official resnet in the case of smaller model due to expansion_rate.
@SangbumChoi @NielsRogge That Paddle ResNet supports a/b/c/d variants. Looks like their D should match timm D with one possible exception noticed on the first pass, they might be using deformable convs by default and I've avoided that, usually need a custom kernel for those to be fast. If DCN is disabled I think they'd match.
@rwightman I think unfortunately RTDETR backbone with resnet34d and resnet18d is not compatible due to the upper mention :(
@NielsRogge Since your comment https://github.com/huggingface/transformers/pull/29077#discussion_r1636075533 https://github.com/huggingface/transformers/pull/29077#discussion_r1636081513 Seems valid. There might be change in the config.json in all pretrained weight (I will re-upload and ping/tell you again!)
@NielsRogge All done :)
@amyeroberts Hi amy thanks for the comment. I just want to let you know that all the suggestion is done in commit.
FYI, modeling_rt_detr_resnet is from Resnet : modeling_resnet.py. The comment for can be also updated if there is no backward compatibility issue. Let me know if this is required for another PR!
@NielsRogge Thankfully, I think we don't need additional update for the config.json in model :)