threestudio
threestudio copied to clipboard
Proposal: using multiple guidances in training
I'm planning on implementing a new class of guidance which can ease the use of multiple different guidances in training, for example zero123 SDS + DeepFloyd-IF SDS.
- a
MultipleGuidance
which takes the config of multiple guidances and initialize them. Each guidance has asuffix
string which is appended to the keys of its returned dict so that we can distinguish returned values of the same name from different guidances (e.g.,loss_sds_zero123
and `loss_sds_if). - a
CombinedGuidance
which inheritsMultipleGuidance
. It evaluates every guidance at each iteration. - an
IterativeGuidance
which inheritsMultipleGuidance
. It only evaluates some of the guidances at each iteration. For example, zero123 at even steps and IF at odd steps.
Any suggestions?
That sounds interesting. I think there is another way to implement this from the perspective of the diffusion model. We can implement a mixture of different diffusion models, namely A, B, and C using the following equation:
pred_noise = lambda_A*pred_noise_A + lambda_B*pred_noise_B + lambda_C*pred_noise_C
By doing this, we can keep the loss terms unchanged. This method is also applicable for DU (dataset update), and it is possible to implement a VSD version of this mixture diffusion model.
Another option is to provide a system that integrates different guidances, given the perspective that each guidance is actually equivalent to a loss. So I guess it is also a chance for us to incorporate common losses into systems.
We have a version of this implemented in zero123, because we take guidance from ref + zero123. I expanded that to take guidance from ref + SD/DF + zero123. You could take a look, it's not as modular as having a separate MixtureGuidance class though.
@bennyguo Hi when are you going to release this PR? I am very looking forward to this feature. This is also very relevant to my recent work: Magic123, which uses SDS plus Zero123-SDS guidance for image-to-3D generation. Once this feature is launched, Magic123 can be seamless integrated into this repo. Magic123 achieves outstanding performance in single image-to-3D generation using both 2D and pesudo 3D (zero123) diffusion guidance.
@voletiv Very interesting. Looks like you have a similar idea as our work. I will check your repo when I am free. You can also take a quick look at our project webpage: https://guochengqian.github.io/project/magic123 We arxived it one month ago.
@guochengqian Actually my intention behind this was to reproduce Magic123😂 Good work! I did a quick adaptation to support training with the zero123 guidance and the SD guidance at the same time but did not get very good results. Are you interested in implementing Magic123 in threestudio? I can share my branch as a start point.
Cool! I can take a look, but not sure I have time to do it or not. I am busy with my another ongoing project at Snap. Anyway, please share your branch. It might only take me a few hours to get it right. I will also share my code based on stable dreamfusion today or tomorrow on this github repo: https://github.com/guochengqian/Magic123/issues/1
@guochengqian Cheers on the code release! I might want to read your code first and try to reproduce it in threestudio :) I'll let you know if I have any problems about the implementation.
我们在 Zero123 中实现了一个版本,因为我们接受 ref + Zero123 的指导。我将其扩展为接受 ref + SD/DF + Zero123 的指导。您可以看一下,它不像单独的 MixtureGuidance 类那样模块化。
Cool! Where can I see this?