DreamArtist-sd-webui-extension icon indicating copy to clipboard operation
DreamArtist-sd-webui-extension copied to clipboard

one shot training results are extremely poor, I cannot reproduce the results of the paper

Open xiaqingsun opened this issue 1 year ago • 5 comments

When I train with a picture from the paper image my training results are as follows with the prompt "a painting of city, art by ch_fg4; ch_fg-neg" : image image

but the results in the paper are like this: image

My training parameters are as follows: image image image

Is there something wrong with my training process causing this poor result?

Thanks!

xiaqingsun avatar Feb 13 '23 06:02 xiaqingsun

I would try it with fewer tokens and no initialisation text. Could also try lower CFG.

What might be an issue is the fact reconstruction is broken and there's been no update to fix it, though I don't know. I'll give it a shot later and try a few things, see if I can get anything vaguely similar.

RainehDaze avatar Feb 15 '23 08:02 RainehDaze

Hm, I wonder if it's the lack of reconstruction? I checked the paper, and it was pretty clear--three positive tokens, three negative tokens, 0.0025 learning rate, and a mysterious gamma that I think must be CFG as it was set as 5. 2-8k steps. So, I set it up to train that way, as well as a comparison with a much higher learning rate and EMA below 1 (and 6 negative tokens), and it's been kind of nonfunctional. The one with the really high learning rate sort of got something:

image

But nothing on the paper results. Those all seem to generate generic cities, e.g.: rlinestyle2-2700

Graphing the loss and looking at the vectors (using https://github.com/Zyin055/Inspect-Embedding-Training), it's definitely learning something, but I have no clue what: rlinestyle2-5900-loss rlinestyle2-5900-vector

RainehDaze avatar Feb 15 '23 10:02 RainehDaze

set learning rate to 0.003 wolud be better, 0.0003 is too small

IrisRainbowNeko avatar Apr 24 '23 05:04 IrisRainbowNeko

You can also try the improved version of DreamArtist++ with lora added for better results HCP-Diffusion

IrisRainbowNeko avatar Apr 24 '23 05:04 IrisRainbowNeko

This might be a shot in the dark, but if you still have the embeddings could you try using the negative embedding in the positive and the positive in the negative?

Recently I found that my embeds that previously looked bad actually worked pretty well in reverse and successfully captured elements of the training images much better than the non-reversed prompt. I don't know why this happened several times and I don't know if I made a bad edit to the script. As far as I can tell all I've done is changed it to not use the entire positive prompt as the negative prompt with only the trained embedding swapped for the negative version, and instead now it should be using only the negative embedding without the positive prompt from the prompt template.

torridgristle avatar Jun 04 '23 18:06 torridgristle