DiffusionDet: Diffusion models for object detection
Model/Pipeline/Scheduler description
Recent work which leverages diffusion models for object detection task. https://arxiv.org/abs/2211.09788
Add capability to run it through HF diffusers pipeline and if possible also create benchmarks or comparison on datasets like nuScenes.
Open source status
- [X] The model implementation is available
- [ ] The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
No response
Model weights seem to be available as well no? https://github.com/ShoufaChen/DiffusionDet#models
@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.
@patrickvonplaten mind if I give this a try? This would be my first time contributing a model, so I might need a hand occasionally.
Sorry I forgot to assign the issue to myself while opening but was actually planning to look into this.
Go for it! 🤗
Go for it! 🤗
I am happy to collaborate if you want :) I have done it in the past and given I will be only working outside office-hours, things can move faster that way.
Ordinarily I would say yes, but I don't think I can dedicate any time towards it until Tuesday at the earliest. So probably best for you to make a start yourself and if you have anything you want to hand off to me, I can chip in a bit 😅
Also more than happy to help if needed :-)
@patrickvonplaten Plan to run through their code this weekend in inference mode. Let me know if you have a task checklist in your mind which i should be following. Happy to split up if needed.
Sure, I think the following would make sense:
- Get the pipeline working witht the original codebase
- Add the core unet model to
diffusersa) First make sure weights can be correctly b) Then check forward pass - Add remaining components
Also happy to guide you through a PR :-)
I am able to run the original codebase here: https://colab.research.google.com/drive/1rA5SXuTx2pI6o7tWA6Ad5QRZn4a1ajMh#scrollTo=Sn5gWF3fhpf-
@patrickvonplaten This work is tried on types of encoder, CNN based (Resnet Style) and Transformer based (Swin Transformer). Do you prefer transformer based, also is there a HF implementation of Swin Transformer which i should refer?
Think leveraging an existing transformers implementation could make a lot of sense here (also cc @ShoufaChen as the author of diffusionDet :-) )
And maybe @NielsRogge FYI
Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do from transformers import SwinModel
Hello everyone,
Thanks for your efforts in integrating DiffusionDet into awesome diffusers.
We provided Swin-Base model here: https://github.com/ShoufaChen/DiffusionDet/releases/download/v0.1/diffdet_coco_swinbase.pth.
Hi, @345ishaan ,
May I ask about your progress on this integration?
I am glad to offer help.
@ShoufaChen Sorry for the delay here. I am running very busy at work because of EOY launches (hopefully over by tomorrow). Last weekend, I was able to run your demo succesfully in a standalone colab. This Fri-Sun, I was planning to look into doing the following:
- create a diffusion pipeline for DiffusionDet.
- Preload weights from the SwinTF encoder model into the one under huggingface.
- Read and understand the detection decoder and try integrating.
Happy to work alongside you here as I guess we can do it much faster with you being involved. Please let me know what suits you. I am definitely interested in pushing it to finish line.
Hi, @345ishaan ,
You can leave the most challenging part to me since I think I am more familiar with DiffusionDet (as the author of this work).
Yes we do have Swin implemented here -> https://github.com/huggingface/transformers/blob/main/src/transformers/models/swin/modeling_swin.py. So you can do
from transformers import SwinModel
@patrickvonplaten @NielsRogge i am guessing transformers and diffusers are maintained as separate libraries. so i branched out the SwinTransformer impln linked above into src/diffusers/models..any suggestions if i should avoid it?
Hey @345ishaan,
We're trying to leverage transformers as much as possible. So for the image encoding which is based on a SwinModel, please don't add the code to src/diffusers/models instead:
- If the model can be used out of the box from
timmortransformers, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265) - If the model requires some hacks/ tweaks please add it as a file to the pipeline folder as done here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/alt_diffusion/modeling_roberta_series.py#L59
We only put actual diffusion models into src/diffusers - i.e. models that are called over and over again in the denoising process.
Does this make sense?
Hey @345ishaan,
We're trying to leverage
transformersas much as possible. So for the image encoding which is based on a SwinModel, please don't add the code tosrc/diffusers/modelsinstead:
- If the model can be used out of the box from
timmortransformers, feel free to just directly important it in the pipeline (e.g. like we do here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L265 )- If the model requires some hacks/ tweaks please add it as a file to the pipeline folder as done here: https://github.com/huggingface/diffusers/blob/847daf25c7e461795932099c5097eb8ac489645c/src/diffusers/pipelines/alt_diffusion/modeling_roberta_series.py#L59
We only put actual diffusion models into
src/diffusers- i.e. models that are called over and over again in the denoising process.Does this make sense?
Thanks for the explaination @patrickvonplaten, will follow what you suggested.