Deformable attention implementation
Deformable attention implementation. Fixes: https://github.com/pytorch/pytorch/issues/112827
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9260
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@NicolasHug Can you please review this PR?
Hi @Isalia20 , thanks for the PR.
This is a massive one :)
Can you share more about why this is needed? I shared my thought and questions on that before inhttps://github.com/pytorch/pytorch/issues/112827#issuecomment-1804905345. It seems like there are really solid implementations of deformable attention already, so we'd like to understand the added-value of having this natively in torchvision.
There are 2 main reasons why this would be a nice addition:
-
The deformable attention kernels are definitely solid, but they are outdated. When installing they give deprecation warnings for the kernels. So it's a matter of time before torch removes some of the deprecated features and these kernels stop working. As I remember when implementing this PR, one of them was due to usage of
.data<scalar_t>()for acccessing the data_ptr. In this PR that is replaced with the newdata_ptr()so no warnings are given. See: https://github.com/Isalia20/vision/blob/cc76d33e073f9ecf7a8af695c31708663c6a7fac/torchvision/csrc/ops/cuda/deform_attn_kernel.cu#L1691 https://github.com/facebookresearch/dinov3/blob/54694f7627fd815f62a5dcc82944ffa6153bbb76/dinov3/eval/segmentation/models/utils/ops/src/cuda/ms_deform_attn_cuda.cu#L142C42-L142C59 -
Another one is to have it directly in torchvision, so users don't need to additionally build the kernels themselves like it's done in the SOTA vision model like DINOv3: https://github.com/facebookresearch/dinov3/tree/main/dinov3/eval/segmentation/models/utils/ops/src/cuda
@NicolasHug Any updates on this?
@NicolasHug kind reminder for this. Would be great to get this in the 2.10 release
Hi @Isalia20 . I just want to be fully transparent to manage expectations: we won't be able to make this happen for 2.10. Reviewing this PR is a massive amount of work and the team bandwidth is low - especially for non-preproc/transforms stuff, which is what we're focusing on right now.
I am honestly still unsure of the value that having this in TV would bring. The implementation in mmvc seems to be maintained, so hopefully deprecation warnings aren't that big of a deal there?
@NicolasHug Got it thanks. mmcv's latest commit on main was 8 months ago, which means that project isn't really maintained. If this PR is too big, I can chop it up in smaller pieces to make it manageable to review. I have to think on how to really structure it through multiple PRs but if that helps I can do it
@NicolasHug should I split it into multiple smaller ones to make it easier to review?