threestudio
threestudio copied to clipboard
Is it possible to use pretrained model fine tuned with LORA?
Same as title, what if i want to use a model fine tuned by LORA to generate reference image? Does the paper support things like that?
Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?
Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?
Yes. most of the methods rely on a frozen T2I model. So I wonder if we can use a version fine tuned by LORA? Thanks for reply.
Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?
Yes. most of the methods rely on a frozen T2I model. So I wonder if we can use a version fine tuned by LORA? Thanks for reply.
to be specific, the model used in prolificdreamer referenced by path system.guidance.pretrained_model_name_or_path
Let me clarify. There are actually two different things related to what I just said. Maybe I am talking about one of them and you are referring to another.
-
Just run the prolificdreamer pipeline without any change. And from the pipeline, we will get a lora-tuned model trained on the specific prompt and conditioned on camera pose. And we sample images from this model using t2i schedulers such as DPM-Solver. This is supported in threestudio, just add
system.visualize_samples=True
. -
In prolificdreamer, we replace the model that should be trained with lora in optimization with a model that has already been fine-tuned with lora. So in this case, I guess if the model is completely frozen, it should not work because the model should give an estimation of the distribution of the current UNDER-OPTIMIZED rendered image rather than NEARLY PERFECT rendered image. If the model is still trained during the pipeline, such as loading some weights from ControlNet and continuing to train with vsd, I guess it could work to some extent but I am not very sure how the input should be. In this case, we may need a function to load lora weights and it is very easy to implement.
So due to your description, the T2I model has already been fine-tuned with lora? But what does it fine-tuned for? There are three part of weights? the T2I model, the pretrained LORA, the optimization target LORA?
Or the pipe + pipe_lora is the T2I system mentioned in the paper? But the pipe_lora refers to a hugging face full model?
I'm a little confused, and new to this project. If I need to take more time to read the paper or the project, please tell me. Thanks for your help!
Hi, @krNeko9t.
In the StableDiffusionVSDGuidance, the "pipe" represents the frozen T2I base model, while "pipe_lora" represents the frozen T2I model with an additional unfrozen LORA1(for 3D generation). So if you want to utilize a pre-trained model fine-tuned with LORA, your base model would become T2I + LORA2(your lora model). It's important to notice that LORA1(for 3D generation) and LORA2 (your lora model) are completely distinct. However, currently, threestudio only supports a single T2I model without any other additional modules as the base model. Therefore, you'll need to implement some code to enable support for your T2I + LORA2. One possible solution, as suggested by @thuliu-yt16, is as follows:
- Implement a new pipe to load a T2I + LORA2 model as the base model. Both T2I and LORA2 should be frozen.
- Implement a new pipe_lora to enable training a (T2I+LORA2)+LORA1 model. T2I and LORA2 need to be frozen, while LORA1 should be trained.
- If splitting and training two LORA models is challenging. You have the option to incorporate an alternative module such as a ControlNet or a UNet (mentioned in prolificdreamer). In this case, your new pipe_lora would become (T2I+LORA2)+ControlNet/UNet. Here the effect of ControlNet/UNet is similar to that of LORA1.
Can i ask another question: why there is two sd model used in prolific system? the origin paper seems dosen't mention this. what's the benefit of this strategy?
The SD model 2-1-base/2-1 is in eps-prediction/v-prediction mode. The authors said that using a v-prediction model for lora works better. You can definitely try both. I think there is no big difference.