diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

DiffusionDet: Diffusion models for object detection

Open 345ishaan opened this issue 3 years ago • 19 comments

Model/Pipeline/Scheduler description

Recent work which leverages diffusion models for object detection task. https://arxiv.org/abs/2211.09788

Add capability to run it through HF diffusers pipeline and if possible also create benchmarks or comparison on datasets like nuScenes.

Open source status

  • [X] The model implementation is available
  • [ ] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

No response

345ishaan avatar Nov 21 '22 03:11 345ishaan

Model weights seem to be available as well no? https://github.com/ShoufaChen/DiffusionDet#models

patrickvonplaten avatar Nov 21 '22 11:11 patrickvonplaten

@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.

vvvm23 avatar Nov 22 '22 08:11 vvvm23

@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.

Sorry I forgot to assign the issue to myself while opening but was actually planning to look into this.

345ishaan avatar Nov 22 '22 08:11 345ishaan

Go for it! 🤗

vvvm23 avatar Nov 22 '22 11:11 vvvm23

Go for it! 🤗

I am happy to collaborate if you want :) I have done it in the past and given I will be only working outside office-hours, things can move faster that way.

345ishaan avatar Nov 23 '22 08:11 345ishaan

Ordinarily I would say yes, but I don't think I can dedicate any time towards it until Tuesday at the earliest. So probably best for you to make a start yourself and if you have anything you want to hand off to me, I can chip in a bit 😅

vvvm23 avatar Nov 23 '22 14:11 vvvm23

Also more than happy to help if needed :-)

patrickvonplaten avatar Nov 29 '22 12:11 patrickvonplaten

@patrickvonplaten Plan to run through their code this weekend in inference mode. Let me know if you have a task checklist in your mind which i should be following. Happy to split up if needed.

345ishaan avatar Nov 30 '22 04:11 345ishaan

Sure, I think the following would make sense:

  1. Get the pipeline working witht the original codebase
  2. Add the core unet model to diffusers a) First make sure weights can be correctly b) Then check forward pass
  3. Add remaining components

Also happy to guide you through a PR :-)

patrickvonplaten avatar Dec 01 '22 16:12 patrickvonplaten

I am able to run the original codebase here: https://colab.research.google.com/drive/1rA5SXuTx2pI6o7tWA6Ad5QRZn4a1ajMh#scrollTo=Sn5gWF3fhpf-

345ishaan avatar Dec 04 '22 00:12 345ishaan

@patrickvonplaten This work is tried on types of encoder, CNN based (Resnet Style) and Transformer based (Swin Transformer). Do you prefer transformer based, also is there a HF implementation of Swin Transformer which i should refer?

345ishaan avatar Dec 11 '22 04:12 345ishaan

Think leveraging an existing transformers implementation could make a lot of sense here (also cc @ShoufaChen as the author of diffusionDet :-) )

And maybe @NielsRogge FYI

patrickvonplaten avatar Dec 13 '22 17:12 patrickvonplaten

Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do from transformers import SwinModel

NielsRogge avatar Dec 13 '22 17:12 NielsRogge

Hello everyone,

Thanks for your efforts in integrating DiffusionDet into awesome diffusers.

We provided Swin-Base model here: https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_swinbase.pth.

ShoufaChen avatar Dec 14 '22 01:12 ShoufaChen

Hi, @345ishaan ,

May I ask about your progress on this integration?

I am glad to offer help.

ShoufaChen avatar Dec 16 '22 02:12 ShoufaChen

@ShoufaChen Sorry for the delay here. I am running very busy at work because of EOY launches (hopefully over by tomorrow). Last weekend, I was able to run your demo succesfully in a standalone colab. This Fri-Sun, I was planning to look into doing the following:

  1. create a diffusion pipeline for DiffusionDet.
  2. Preload weights from the SwinTF encoder model into the one under huggingface.
  3. Read and understand the detection decoder and try integrating.

Happy to work alongside you here as I guess we can do it much faster with you being involved. Please let me know what suits you. I am definitely interested in pushing it to finish line.

345ishaan avatar Dec 16 '22 04:12 345ishaan

Hi, @345ishaan ,

You can leave the most challenging part to me since I think I am more familiar with DiffusionDet (as the author of this work).

ShoufaChen avatar Dec 16 '22 07:12 ShoufaChen

Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do from transformers import SwinModel

@patrickvonplaten @NielsRogge i am guessing transformers and diffusers are maintained as separate libraries. so i branched out the SwinTransformer impln linked above into src/diffusers/models..any suggestions if i should avoid it?

345ishaan avatar Dec 19 '22 08:12 345ishaan

Hey @345ishaan,

We're trying to leverage transformers as much as possible. So for the image encoding which is based on a SwinModel, please don't add the code to src/diffusers/models instead:

  • If the model can be used out of the box from timm or transformers, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265)
  • If the model requires some hacks/ tweaks please add it as a file to the pipeline folder as done here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/alt_diffusion/modeling_roberta_series.py#L59

We only put actual diffusion models into src/diffusers - i.e. models that are called over and over again in the denoising process.

Does this make sense?

patrickvonplaten avatar Dec 20 '22 00:12 patrickvonplaten

Hey @345ishaan,

We're trying to leverage transformers as much as possible. So for the image encoding which is based on a SwinModel, please don't add the code to src/diffusers/models instead:

  • If the model can be used out of the box from timm or transformers, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265 )
  • If the model requires some hacks/ tweaks please add it as a file to the pipeline folder as done here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/alt_diffusion/modeling_roberta_series.py#L59

We only put actual diffusion models into src/diffusers - i.e. models that are called over and over again in the denoising process.

Does this make sense?

Thanks for the explaination @patrickvonplaten, will follow what you suggested.

345ishaan avatar Dec 24 '22 01:12 345ishaan