stable-dreamfusion
stable-dreamfusion copied to clipboard
Question about the function of the model.
Maybe I'm fundamentally misunderstanding something about the whole process but if you have the ability to direct the model to specifically make the model generate images in certain directions why do you need so many different scripts and functions and what not? What I mean is why not generate the images from the specific angles, have them save to a folder and just feed those images to the NeRF as you would in the traditional use of NeRF scene generation? Or is there something special about what this technique is doing that generates better results than traditional NeRF generations? Perhaps though that's exactly what you and the paper's model are doing and I'm just not seeing it. Sorry if this sounds callous I'm genuinely curious that's why I'm asking, it just seems more complicated than it needs to be.
@Shikamaru5 Hi, this is because it's hard to generate 3D view consistent multi-view images from 2D text-to-image models. For example, you cannot specify generating the same pineapple from the angles you wants. The essence of SDS is the randomized training process that gradually guides the nerf to the target object. It's basically what diffusion model is doing, and you could find some good tutorials here.
Oh I see, I do know what a diffusion model does thanks, I just figured that there was something else going on. What I don't understand I suppose is, are you making the nerf work like a diffusion model then? Or is it that the model is generating a lot of different images of the same type, and getting rid of images that aren't very close to the result you want, or has a high loss, and at the same time randomly initializing the camera and lighting at different spots to hopefully in the end get a decent enough 3D model out of the process? Perhaps this is also why the actual DreamFusion may have some better results because, if what they're stating is true, Imagen may be able to generate a few different angled images of the 3D model they want. There's this interesting technique called open-set grounded text2img modeling or Gligen that I think has the potential to create images from different angles I just haven't tried it as of yet. When I get the image generating model I'm trying to train working I'm going to see if the technique yields any positive results on this. Potentially I'd think if we could specify different angles this process would be so much easier to do.