Adalberto
Adalberto
@JANGSOONMYUN you have to modify the code to support your data, I suggest you take a look at the text-to-image script [here](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py)
@ulmewennberg I'm also working on it myself, from the inpainting pipeline it seem to only use noise on the image latents
I made a pull request [here](https://github.com/huggingface/diffusers/pull/1091) But it's not passing the tests, can someone guide me to fix it? It's my first contribution, not sure how to do it right...
Hi everyone, I haven't tested it extensively but got some interesting results, like this one:  It's still not perfect, but I'll see what I can do to improve. @opetliak...
Oh, I think I found the problem, while most of the parameters were trainable the embed_tokens were not, now it does converge faster, thanks.
 I ran a test setting the requires_grad forthe embed_tokens and lm_head and the result was this... (green line is with unloth) They don't exactly match, but got closer
YaRN improves upon this with "NTK-by-parts" interpolation, which selectively scales dimensions based on their frequency. By looking at the unsloth code I believe all we need is to set "trust_remote_code=True"...
The training uses random masks, this may cause it to learn a bit slower, for me it worked well with more steps like 500-1000, but it could be be different...
Hey @belonel could you elaborate on the masks that you used? I don't know what masks you used, but I believe the training for inpainting takes longer because of the...
@belonel these masks look pretty good, they must be much better for training the model compared to the random ones.