The type of control input of segmentation ControlNet?
Hi! Thanks for this awesome work!
I read through your paper on arXiv but I have a small confusion about the segmentation ControlNet.
Specifically, what is the type of the control input (i.e., segmentation)? Is it an ordinary segmentation image, in which each pixel has a class label that is either one-hot encoded or an integer? Or, is it a colorful image, in which the class label of each pixel is color coded?
I presumed, when I read the paper, that an ordinary segmentation image is used to be the direct input control (without preprocessing), but I saw in a community news, some artists changed the color of the segmentation to change the generated artifacts. That makes me speculate that the control of segmentation ControlNet may be color-coded segmentation and it was also trained with color-coded segmentation images? But this does not make sense to me since some labels such as "wall" and "road" in ADE20K have visually similar color code (RGB:787878 and RGB: 8C8C8C) while they have little semantic similarity.
I've dug into the code a little bit. It seems the input control is color-coded segmentation.
https://github.com/lllyasviel/ControlNet/blob/d249f5bfc66c7af9b3102dccc2162c6d17270748/gradio_seg2image.py#L29
https://github.com/lllyasviel/ControlNet/blob/d249f5bfc66c7af9b3102dccc2162c6d17270748/annotator/uniformer/init.py#L22
But this can be a bug and we can observe it: When we draw a slightly complex scene directly with color codes, some classes will be mixed together due to color similarity.
See the below color-coded segmentation image.

Left: Hovel #FF00FF Right: Bus #FF00F5
And the generated samples with prompt "hovel and bus, masterpiece, high quality":

So we can see that the model does confuse "bus" with "hovel".
To fix this problem, I think the ad-hoc way could be: Train a embedding matrix for ADE20K classes and the embedding dimension is simply 3, and then map an discrete segmentation image to a feature map where each pixel has the corresponding embedding for its label.
see here https://github.com/lllyasviel/ControlNet/issues/172