ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

Training a ControlNet to generate furnished room -> empty room (and vice versa). Improvement plateau...

Open whydna opened this issue 1 year ago • 5 comments

I'm working on a project to take images of furnished rooms and remove all the furniture. I've got a large dataset of image pairs. I'm not using any preprocessing on the images so as to allow the model to preserve details of the original image (wall color, floor material, etc.).

After training on a 4090 for about 5 days, and I'm no longer seeing any improvement (see examples below).

I'm looking to get tips about where to go from here.

  • Does it just need to be trained longer?
  • Do I need to adjust the learning rate?
  • Should I spend more time cleaning the dataset (a small % of the dataset is probably bad, as you can see in one of the examples below, the target image is dark).
  • Should I preprocess the image to simplify this? (i.e MLSD) It would lose the details of the original, but maybe at least will provide better output for final image.
  • Perhaps ControlNet isn't the right arch for this and instead use pix2pix?

Thanks for the help!

Example 1

Source: Screenshot 2024-03-15 at 10 18 12 AM

Target: Screenshot 2024-03-15 at 10 21 32 AM

Model Result: Screenshot 2024-03-15 at 10 17 59 AM

Example 2

Source: Screenshot 2024-03-15 at 10 20 30 AM

Target: Screenshot 2024-03-15 at 10 21 54 AM

Model Result: Screenshot 2024-03-15 at 10 20 41 AM

First Training Run

Screenshot 2024-03-15 at 10 29 44 AM

Second Training Run

Screenshot 2024-03-15 at 10 30 41 AM

whydna avatar Mar 15 '24 14:03 whydna

how large was your dataset?

dereksun105 avatar Jun 11 '24 03:06 dereksun105

Controlnet is not the right arch for this, instead play around with inpainting methods.

innat-asj avatar Jul 14 '24 09:07 innat-asj

@innat-asj can you elaborate a bit? ty!

whydna avatar Jul 14 '24 12:07 whydna

It's only my understanding about the architecture. The control-net doesn't required for the removal operations. Coz, there is nothing to control. Instead, there are few option we could try to remove the object.

I tried Mi-GAN out of the box with the given checkpoint and its promising. Hence, if it could be trained for specific task, it would be better. I also tried LaMa and MAT, but I found MI-GAN better in terms of simplicity and performance.

Lastly, reversing the above process won't work for empty room to furnished room. It requires additional stuff. In that case, control-net will be required.

innat-asj avatar Jul 14 '24 14:07 innat-asj