vision Add DETR model

related to https://github.com/pytorch/vision/issues/2707, work with @oke-aditya @triple-Mu

[ ] Backbone: ResNet50
[x] Transformer: Encoder + Decoder
[ ] Position embed
[ ] Detection head & loss
[ ] label assignment
[ ] datapipes

Apr 29 '22 02:04 xiaohu2015

Sorry for an Early poke at the PR, but I would like to know why we are not using nn.TransformerEncoder layer? Although I'm yet to dive into technical details and aspects.

May 09 '22 19:05 oke-aditya

Sorry for an Early poke at the PR, but I would like to know why we are not using nn.TransformerEncoder layer? Although I'm yet to dive into technical details and aspects.

For the offical code, they do some modifcation of nn.TransformerEncoder, it maybe affect the performance.

May 10 '22 02:05 xiaohu2015

@xiaohu2015 I hope you are well. It's been a while since this PR has seen any action. I wonder if you plan to continue slowly working on it or you think it's unlikely to do this in H2. No pressure. Thanks! :)

Sep 14 '22 14:09 datumbox

@datumbox @xiaohu2015 Is there any progress on this? If not, I would love to help!

Oct 27 '22 15:10 deepwilson

@deepwilson I believe this is up for grabs as @xiaohu2015 doesn't have the time to complete it now. It would be awesome if we can continue the work. :)

Oct 27 '22 15:10 datumbox