threestudio icon indicating copy to clipboard operation
threestudio copied to clipboard

Proposal: using multiple guidances in training

Open bennyguo opened this issue 1 year ago • 8 comments

I'm planning on implementing a new class of guidance which can ease the use of multiple different guidances in training, for example zero123 SDS + DeepFloyd-IF SDS.

  • a MultipleGuidance which takes the config of multiple guidances and initialize them. Each guidance has a suffix string which is appended to the keys of its returned dict so that we can distinguish returned values of the same name from different guidances (e.g., loss_sds_zero123 and `loss_sds_if).
  • a CombinedGuidance which inherits MultipleGuidance. It evaluates every guidance at each iteration.
  • an IterativeGuidance which inherits MultipleGuidance. It only evaluates some of the guidances at each iteration. For example, zero123 at even steps and IF at odd steps.

Any suggestions?

bennyguo avatar Jul 05 '23 05:07 bennyguo

That sounds interesting. I think there is another way to implement this from the perspective of the diffusion model. We can implement a mixture of different diffusion models, namely A, B, and C using the following equation:

pred_noise = lambda_A*pred_noise_A +  lambda_B*pred_noise_B +  lambda_C*pred_noise_C

By doing this, we can keep the loss terms unchanged. This method is also applicable for DU (dataset update), and it is possible to implement a VSD version of this mixture diffusion model.

DSaurus avatar Jul 05 '23 07:07 DSaurus

Another option is to provide a system that integrates different guidances, given the perspective that each guidance is actually equivalent to a loss. So I guess it is also a chance for us to incorporate common losses into systems.

thuliu-yt16 avatar Jul 05 '23 10:07 thuliu-yt16

We have a version of this implemented in zero123, because we take guidance from ref + zero123. I expanded that to take guidance from ref + SD/DF + zero123. You could take a look, it's not as modular as having a separate MixtureGuidance class though.

voletiv avatar Jul 05 '23 14:07 voletiv

@bennyguo Hi when are you going to release this PR? I am very looking forward to this feature. This is also very relevant to my recent work: Magic123, which uses SDS plus Zero123-SDS guidance for image-to-3D generation. Once this feature is launched, Magic123 can be seamless integrated into this repo. Magic123 achieves outstanding performance in single image-to-3D generation using both 2D and pesudo 3D (zero123) diffusion guidance.

@voletiv Very interesting. Looks like you have a similar idea as our work. I will check your repo when I am free. You can also take a quick look at our project webpage: https://guochengqian.github.io/project/magic123 We arxived it one month ago.

image

guochengqian avatar Jul 28 '23 03:07 guochengqian

@guochengqian Actually my intention behind this was to reproduce Magic123😂 Good work! I did a quick adaptation to support training with the zero123 guidance and the SD guidance at the same time but did not get very good results. Are you interested in implementing Magic123 in threestudio? I can share my branch as a start point.

bennyguo avatar Jul 28 '23 11:07 bennyguo

Cool! I can take a look, but not sure I have time to do it or not. I am busy with my another ongoing project at Snap. Anyway, please share your branch. It might only take me a few hours to get it right. I will also share my code based on stable dreamfusion today or tomorrow on this github repo: https://github.com/guochengqian/Magic123/issues/1

guochengqian avatar Jul 28 '23 15:07 guochengqian

@guochengqian Cheers on the code release! I might want to read your code first and try to reproduce it in threestudio :) I'll let you know if I have any problems about the implementation.

bennyguo avatar Aug 03 '23 03:08 bennyguo

我们在 Zero123 中实现了一个版本,因为我们接受 ref + Zero123 的指导。我将其扩展为接受 ref + SD/DF + Zero123 的指导。您可以看一下,它不像单独的 MixtureGuidance 类那样模块化。

Cool! Where can I see this?

834810269 avatar Mar 25 '24 08:03 834810269