diffusers
diffusers copied to clipboard
Marigold - monocular depth estimation pipeline
Model/Pipeline/Scheduler description
Marigold depth is the current diffusion-based state-of-the-art monocular depth estimation pipeline (image-to-depth). It is derived from Stable Diffusion and fine-tuned with synthetic data. Marigold can zero-shot transfer to unseen data, offering usable results under the most challenging conditions in the wild.
This issue is the point of discussion regarding the pipeline status and future development. Recently, it has been integrated (https://github.com/huggingface/diffusers/pull/6249) as a community pipeline into diffusers.
Open source status
- [X] The model implementation is available.
- [X] The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
Authors: @markkua @toshas @ShengyuH @nandometzger @rcdaudt @prs-eth Model weights: https://huggingface.co/Bingxin/Marigold
I see this as extending diffusers
in the form of a pipeline to support new tasks other than classical image generation. For example, in this case, we have "depth estimation", which is a critical computer vision problem with lots of real-world applicability.
https://github.com/prs-eth/Marigold repository has got more than 1k stars, too, which is promising.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@yiyixuxu curious to know your thoughts.
Interesting! Community pipeline first?
We already have it under community
:-)
let's keep an eye on it then!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
We are brewing a pull request to integrate our new Marigold-LCM model and possibly a pipeline. Additional modalities (normals) are also in the making.
A few questions regarding planning the pipelines growth and lifecycle.
1
The new Marigold-LCM pipeline will use the same code as the original but with a different set of defaults and expected ranges of values. We are thinking of adding a new file, called marigold_lcm_depth_estimation.py
, and in there subclassing a new MarigoldLcmDepthPipeline
class from the MarigoldPipeline
class located currently in the marigold_depth_estimation.py
file. In this new class, we would then override __call__
method with new defaults of kwargs
, and add extra asserts regarding the actual passed values. Does this seem like a good solution, or are we better off making pipelines not dependent on each other?
2
I also noticed that we have made a couple of bad choices when initially committing the Marigold depth pipeline:
- we called the depth pipeline
MarigoldPipeline
. It should have beenMarigoldDepthPipeline
. - we called the file with the depth pipeline
marigold_depth_estimation.py
, which is reflected in how the pipeline is instantiated using thefrom_pretrained
method. We should have called the filemarigold_depth_pipeline.py
instead.
I want to perform these renaming actions (while also keeping in mind the LCM pipeline) to make it nice and extensible for the normals and other modalities pipelines. Would you recommend just renaming them all? Is it possible to maintain backward compatibility?
Renaming will be a little bit tricky because it will not be backwards-compatible. I think it's okay as is because the first version of Marigold deals with depth anyway.
Your plan with 1 sounds rock-solid to me! Keep us posted.
My two cents here is that marigold should be added to the core now, I like it a lot and with LCM it should be fast.
The model has almost 40k downloads in huggingface so it's very popular and can be used right out of the box with diffusers while the other depth maps preprocessors require to install controlnet_aux
and sometimes gives problems because they're not part of this repo.
Personally I use it a lot, specially with diff-diff which benefits the most, and probably a controlnet trained with marigold will be a lot better than the current one, but it also works with them.
@toshas I have one suggestion, could you please upload the fp16 variants for them so people don't have to download the full precision ones if they don't use them?
Really nice work!
Yeah agreed now. It's time for a little graduation.
@yiyixuxu let's do an issue for this so that community can pick it up!
yes yes I created an issue here https://github.com/huggingface/diffusers/issues/7522
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.