threestudio icon indicating copy to clipboard operation
threestudio copied to clipboard

Prolific Dreamer V Prediction

Open jaidevshriram opened this issue 1 year ago • 1 comments

Hey! I see that the LoRA model for prolific dreamer uses the v-prediction model and as stated in previous issues #141 and #99, it produces better results. I'm wondering if using the v-prediction model for pretrained_model would further improve results? Anecdotally, I preferred the images sampled from stable-diffusion-2-1 compared to stable-diffusion-2-1-base, so this makes sense. I tried to run this experiment myself but got caught in this assert:

https://github.com/threestudio-project/threestudio/blob/1de1f157aaf34307bb65c05c6bba9a46a03efdcc/threestudio/models/guidance/stable_diffusion_vsd_guidance.py#L504-L505

Do you have any insight on making it work for 2-1? Further, since the 2-1 model operates at 768x768 - I expected the render to be resized when passed to the LoRA model, but this was not the case? Is there a reason why?

Thanks!!

jaidevshriram avatar Jun 25 '23 22:06 jaidevshriram

Hi! I haven't tried this myself but I think you should convert noise_pred_pretrain from velocity to epsilon before this:

https://github.com/threestudio-project/threestudio/blob/74ff18a3810cdea39a5341ef8652d4fdc3ffde9f/threestudio/models/guidance/stable_diffusion_vsd_guidance.py#L498-L501

You may refer to this snippet for the conversion:

https://github.com/threestudio-project/threestudio/blob/74ff18a3810cdea39a5341ef8652d4fdc3ffde9f/threestudio/models/guidance/stable_diffusion_vsd_guidance.py#L510-L519

bennyguo avatar Jun 26 '23 04:06 bennyguo

Hi @jaidevshriram, could you get it to work using the stable-diffusion-2-1 instead of the stable-diffusion-2-1-base for the pretrained_model? If so could you share what were the changes needed in the code? Thanks :)

NKadvil avatar Aug 17 '23 13:08 NKadvil