ControlNet [REQUEST] Custom trained model to support cryptomatte and depth pass

It would be great if we could use cryptomatte and depth passes generated from a rendering engine i.e. blender, and to use their combined information to inform the final "rendering" via controlnet. This would be somewhat similar to a combination of depth and segmentation maps as they are currently implemented.

Feb 16 '23 18:02 KaruroChori

Edit: this was a reply to a post later removed.

In theory it would be easy to automatically generate images for training. Blender has a good coverage with python APIs and after some initial setup all steps can be automatic. We could prepare a small but diverse set of scenes (there are many made available by the blender foundation). For each of them we need to set up many vectors for the camera to be moved and aligned to. We enable the two passes we are interested in + the normal full rendering, and profit.

Feb 16 '23 21:02 KaruroChori

Edit: this was a reply to a post later removed.

In theory it would be easy to automatically generate images for training. Blender has a good coverage with python APIs and after some initial setup all steps can be automatic. We could prepare a small but diverse set of scenes (there are many made available by the blender foundation). For each of them we need to set up many vectors for the camera to be moved and aligned to. We enable the two passes we are interested in + the normal full rendering, and profit.

t1 (1)

All we need is 200k of these examples.

Feb 16 '23 21:02 Njasa2k

512x512? It is feasible, even more so with few cards supporting OptiX. Actually eevee got support for Cryptomatte few years ago, so we could avoid Cycles and speed up the rendering process quite a bit.

The main concern would be tagging the final images.

Feb 16 '23 21:02 KaruroChori

512x512? It is feasible, even more so with few cards supporting OptiX. Actually eevee got support for Cryptomatte few years ago, so we could avoid Cycles.

The main concern would be tagging the final images.

Automated captioning by BLIP or whatever?

Feb 16 '23 21:02 Njasa2k

I don't have any experience with it, but its seems good from what I have seen.

Feb 16 '23 21:02 KaruroChori

Wouldn't it be possible to have a more detailed caption? "FF0000: gray car, 00FF00: glass, 0000FF: parking lot"

Feb 17 '23 04:02 toyxyz

Basically a material list exported from blender with at least albedo and the material label? I do not have access to my main workstation at the moment, but next week I would like to see what is feasible in this respect. Also, we need to cope with the limitations of the text model used by stable diffusion, and I am not sure this is too easy.

Feb 17 '23 11:02 KaruroChori

also see "double control" discussion here https://github.com/lllyasviel/ControlNet/discussions/30

Sep 17 '23 10:09 geroldmeisinger

ControlNet ControlNet copied to clipboard

[REQUEST] Custom trained model to support cryptomatte and depth pass

ControlNet
ControlNet copied to clipboard