threestudio
threestudio copied to clipboard
Prolific Dreamer V Prediction
Hey! I see that the LoRA model for prolific dreamer uses the v-prediction model and as stated in previous issues #141 and #99, it produces better results. I'm wondering if using the v-prediction
model for pretrained_model
would further improve results? Anecdotally, I preferred the images sampled from stable-diffusion-2-1
compared to stable-diffusion-2-1-base
, so this makes sense. I tried to run this experiment myself but got caught in this assert:
https://github.com/threestudio-project/threestudio/blob/1de1f157aaf34307bb65c05c6bba9a46a03efdcc/threestudio/models/guidance/stable_diffusion_vsd_guidance.py#L504-L505
Do you have any insight on making it work for 2-1
? Further, since the 2-1
model operates at 768x768 - I expected the render to be resized when passed to the LoRA model, but this was not the case? Is there a reason why?
Thanks!!
Hi! I haven't tried this myself but I think you should convert noise_pred_pretrain
from velocity to epsilon before this:
https://github.com/threestudio-project/threestudio/blob/74ff18a3810cdea39a5341ef8652d4fdc3ffde9f/threestudio/models/guidance/stable_diffusion_vsd_guidance.py#L498-L501
You may refer to this snippet for the conversion:
https://github.com/threestudio-project/threestudio/blob/74ff18a3810cdea39a5341ef8652d4fdc3ffde9f/threestudio/models/guidance/stable_diffusion_vsd_guidance.py#L510-L519
Hi @jaidevshriram, could you get it to work using the stable-diffusion-2-1 instead of the stable-diffusion-2-1-base for the pretrained_model? If so could you share what were the changes needed in the code? Thanks :)