ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

[REQUEST] Custom trained model to support cryptomatte and depth pass

Open KaruroChori opened this issue 2 years ago • 7 comments

It would be great if we could use cryptomatte and depth passes generated from a rendering engine i.e. blender, and to use their combined information to inform the final "rendering" via controlnet. This would be somewhat similar to a combination of depth and segmentation maps as they are currently implemented.

KaruroChori avatar Feb 16 '23 18:02 KaruroChori

Edit: this was a reply to a post later removed.

In theory it would be easy to automatically generate images for training. Blender has a good coverage with python APIs and after some initial setup all steps can be automatic. We could prepare a small but diverse set of scenes (there are many made available by the blender foundation). For each of them we need to set up many vectors for the camera to be moved and aligned to. We enable the two passes we are interested in + the normal full rendering, and profit.

KaruroChori avatar Feb 16 '23 21:02 KaruroChori

Edit: this was a reply to a post later removed.

In theory it would be easy to automatically generate images for training. Blender has a good coverage with python APIs and after some initial setup all steps can be automatic. We could prepare a small but diverse set of scenes (there are many made available by the blender foundation). For each of them we need to set up many vectors for the camera to be moved and aligned to. We enable the two passes we are interested in + the normal full rendering, and profit.

t1 (1)

All we need is 200k of these examples.

Njasa2k avatar Feb 16 '23 21:02 Njasa2k

512x512? It is feasible, even more so with few cards supporting OptiX. Actually eevee got support for Cryptomatte few years ago, so we could avoid Cycles and speed up the rendering process quite a bit.

The main concern would be tagging the final images.

KaruroChori avatar Feb 16 '23 21:02 KaruroChori

512x512? It is feasible, even more so with few cards supporting OptiX. Actually eevee got support for Cryptomatte few years ago, so we could avoid Cycles.

The main concern would be tagging the final images.

Automated captioning by BLIP or whatever?

Njasa2k avatar Feb 16 '23 21:02 Njasa2k

I don't have any experience with it, but its seems good from what I have seen.

KaruroChori avatar Feb 16 '23 21:02 KaruroChori

Wouldn't it be possible to have a more detailed caption? "FF0000: gray car, 00FF00: glass, 0000FF: parking lot"

toyxyz avatar Feb 17 '23 04:02 toyxyz

Basically a material list exported from blender with at least albedo and the material label? I do not have access to my main workstation at the moment, but next week I would like to see what is feasible in this respect. Also, we need to cope with the limitations of the text model used by stable diffusion, and I am not sure this is too easy.

KaruroChori avatar Feb 17 '23 11:02 KaruroChori

also see "double control" discussion here https://github.com/lllyasviel/ControlNet/discussions/30

geroldmeisinger avatar Sep 17 '23 10:09 geroldmeisinger