ControlNet
The pose of generated characters can't currently be controlled in a fine-grained way. Adding this would allow for 3D animation to be generated as well.
Describe the solution you'd like Implement ControlNet - specifically, the pose and segmentation models.
Describe alternatives you've considered Do some sort of image-to-image (or model-to-model) thing
@neverix Yes, this is interesting, I guess the pose model can be used to synthesize better human by initializing a skeleton and rendering it from different views as a prior. However, it is quite task-specific and I may try it later in other projects.
@ashawkey does dreamfusion have way to use img2img instead of txt2img? Whereas you could manually create a predfined image with controlnet which turns objects all the directions, and then feed that to dreamfusion.
@neverix Yes, this is interesting, I guess the pose model can be used to synthesize better human by initializing a skeleton and rendering it from different views as a prior. However, it is quite task-specific and I may try it later in other projects.
The basic approach of rendering a colored mesh would also be useful for rendering from segmentation or applying img2img. If I implemented it I would add a hook users can add that functionality into
@ashawkey I just wanted to add how big a deal something like this would be, being able to generate real 3d characters and models using Stable Diffusion would be huge.
It's already possible, just not using ControlNet
the biggest problem with generating good results is that stable-diffusion does not output the same object with different prompts.
If i get it right, there are words like "front view" ,"side view", etc. added to the text-prompt to generate images from other views. But if you run e.g. python nerf/sd.py --seed 100 "armchair front view" and "armchair side view", you see that you get two totally different armchair pictures. Also you can see, that most pictures with the tag "side view" does not show the side-view of the object, a front-view mostly instead.
i could be wrong
the biggest problem with generating good results is that stable-diffusion does not output the same object with different prompts.
If i get it right, there are words like "front view" ,"side view", etc. added to the text-prompt to generate images from other views. But if you run e.g. python nerf/sd.py --seed 100 "armchair front view" and "armchair side view", you see that you get two totally different armchair pictures. Also you can see, that most pictures with the tag "side view" does not show the side-view of the object, a front-view mostly instead.
i could be wrong
Yes, ControlNet grounding could potentially help solve this. But the signal is already quite good and there are other ways to make progress on this
It's already possible, just not using ControlNet
How is it already possible?
It's already possible, just not using ControlNet
How is it already possible?
Well, you are looking at the repo that makes it possible
I tried controlnet but get this:
Just like use vanilla nerf, it is likely to be misconvergence
I tried controlnet but get this:
Just like use vanilla nerf, it is likely to be misconvergence
Hi, I meet the same problem, do you solve it?