textual_inversion Artifacts arising from 256x256 data

Artifacts arising from 256x256 data

Open GucciFlipFlops1917 opened this issue 2 years ago • 5 comments

Due to GPU constraints (RTX 3070, 8gb VRAM), I lowered my image training dimensions to half the 512x512 standard. Working with faces, a natural problem arose when priming the training data as results returned double/triple the faces with two-three people.

I know the AUTOMATIC1111 repo has a process for scaling the seed and a "high-res fix". Is something similar possible during the embedding training process?

Sep 20 '22 17:09 GucciFlipFlops1917

I'm not sure I understood the problem and where you are seeing it. Is the encoder-decoder part (i.e. the reconstruction images in the log dir) creating additional faces in images with more than 1 person? Are you getting a random number of people in images produced with the learned embedding? Can you post some examples?

Sep 21 '22 05:09 rinongal

I unfortunately can't share pictures as I don't want to post my face, but reconstructed images are fine. The issue is the samples and samples_scaled. Both indeed present 2-3 people based on my input photos of myself.

A minor fix is using 384x384 with close-up pictures of the face. It seems like view angles farther away is more prone to those issues. But any ideas why the scaling problems appear in the same way they do if Stable Diffusion dimensions are lengthened too much?

Sep 21 '22 13:09 GucciFlipFlops1917

Sorry, seems like I completely missed your followup here. Do you still need help with this issue?

Oct 10 '22 18:10 rinongal

Thanks for checking. I will be fine for now :] Truly it's a matter of waiting for optimizations to roll out at this stage.

Oct 10 '22 19:10 GucciFlipFlops1917

You could try to have a look at https://github.com/AUTOMATIC1111/stable-diffusion-webui. They have an alternative implementation and plenty of optimizations. I think I saw someone say they managed to get it working on a 6GB card.

Oct 10 '22 20:10 rinongal

textual_inversion textual_inversion copied to clipboard

Artifacts arising from 256x256 data

textual_inversion
textual_inversion copied to clipboard