diffusers DiffusionDet: Diffusion models for object detection

Model/Pipeline/Scheduler description

Recent work which leverages diffusion models for object detection task. https://arxiv.org/abs/2211.09788

Add capability to run it through HF diffusers pipeline and if possible also create benchmarks or comparison on datasets like nuScenes.

Open source status

[X] The model implementation is available
[ ] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

No response

Nov 21 '22 03:11 345ishaan

Model weights seem to be available as well no? https://github.com/ShoufaChen/DiffusionDet#models

Nov 21 '22 11:11 patrickvonplaten

@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.

Nov 22 '22 08:11 vvvm23

@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.

Sorry I forgot to assign the issue to myself while opening but was actually planning to look into this.

Nov 22 '22 08:11 345ishaan

Go for it! 🤗

Nov 22 '22 11:11 vvvm23

Go for it! 🤗

I am happy to collaborate if you want :) I have done it in the past and given I will be only working outside office-hours, things can move faster that way.

Nov 23 '22 08:11 345ishaan

Ordinarily I would say yes, but I don't think I can dedicate any time towards it until Tuesday at the earliest. So probably best for you to make a start yourself and if you have anything you want to hand off to me, I can chip in a bit 😅

Nov 23 '22 14:11 vvvm23

Also more than happy to help if needed :-)

Nov 29 '22 12:11 patrickvonplaten

@patrickvonplaten Plan to run through their code this weekend in inference mode. Let me know if you have a task checklist in your mind which i should be following. Happy to split up if needed.

Nov 30 '22 04:11 345ishaan

Sure, I think the following would make sense:

Get the pipeline working witht the original codebase
Add the core unet model to diffusers a) First make sure weights can be correctly b) Then check forward pass
Add remaining components

Also happy to guide you through a PR :-)

Dec 01 '22 16:12 patrickvonplaten

I am able to run the original codebase here: https://colab.research.google.com/drive/1rA5SXuTx2pI6o7tWA6Ad5QRZn4a1ajMh#scrollTo=Sn5gWF3fhpf-

Dec 04 '22 00:12 345ishaan

@patrickvonplaten This work is tried on types of encoder, CNN based (Resnet Style) and Transformer based (Swin Transformer). Do you prefer transformer based, also is there a HF implementation of Swin Transformer which i should refer?

Dec 11 '22 04:12 345ishaan

Think leveraging an existing transformers implementation could make a lot of sense here (also cc @ShoufaChen as the author of diffusionDet :-) )

And maybe @NielsRogge FYI

Dec 13 '22 17:12 patrickvonplaten

Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do from transformers import SwinModel

Dec 13 '22 17:12 NielsRogge

Hello everyone,

Thanks for your efforts in integrating DiffusionDet into awesome diffusers.

We provided Swin-Base model here: https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_swinbase.pth.

Dec 14 '22 01:12 ShoufaChen

Hi, @345ishaan ,

May I ask about your progress on this integration?

I am glad to offer help.

Dec 16 '22 02:12 ShoufaChen

@ShoufaChen Sorry for the delay here. I am running very busy at work because of EOY launches (hopefully over by tomorrow). Last weekend, I was able to run your demo succesfully in a standalone colab. This Fri-Sun, I was planning to look into doing the following:

create a diffusion pipeline for DiffusionDet.
Preload weights from the SwinTF encoder model into the one under huggingface.
Read and understand the detection decoder and try integrating.

Happy to work alongside you here as I guess we can do it much faster with you being involved. Please let me know what suits you. I am definitely interested in pushing it to finish line.

Dec 16 '22 04:12 345ishaan

Hi, @345ishaan ,

You can leave the most challenging part to me since I think I am more familiar with DiffusionDet (as the author of this work).

Dec 16 '22 07:12 ShoufaChen

Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do from transformers import SwinModel

@patrickvonplaten @NielsRogge i am guessing transformers and diffusers are maintained as separate libraries. so i branched out the SwinTransformer impln linked above into src/diffusers/models..any suggestions if i should avoid it?

Dec 19 '22 08:12 345ishaan

Hey @345ishaan,

We're trying to leverage transformers as much as possible. So for the image encoding which is based on a SwinModel, please don't add the code to src/diffusers/models instead:

If the model can be used out of the box from timm or transformers, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265)
If the model requires some hacks/ tweaks please add it as a file to the pipeline folder as done here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/alt_diffusion/modeling_roberta_series.py#L59

We only put actual diffusion models into src/diffusers - i.e. models that are called over and over again in the denoising process.

Does this make sense?

Dec 20 '22 00:12 patrickvonplaten

Hey @345ishaan,

We're trying to leverage transformers as much as possible. So for the image encoding which is based on a SwinModel, please don't add the code to src/diffusers/models instead:

If the model can be used out of the box from timm or transformers, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265 )

If the model requires some hacks/ tweaks please add it as a file to the pipeline folder as done here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/alt_diffusion/modeling_roberta_series.py#L59

We only put actual diffusion models into src/diffusers - i.e. models that are called over and over again in the denoising process.

Does this make sense?

Thanks for the explaination @patrickvonplaten, will follow what you suggested.

Dec 24 '22 01:12 345ishaan