custom-diffusion
custom-diffusion copied to clipboard
What's the purpose of these lines?
Thanks for your amazing work! I have one quick question: what is the purpose of these lines, in the modified CrossAttention's forward function? It seems like you disable the gradient of the first token in the embedding? Can you explain a bit?
Thanks!
Hi,
Since the first start of the sentence token is always fixed, I noticed a small improvement when detaching it during the training. I guess this helps in better association between the "V* category" and the target image and thus improved generation on inference time prompt.
Thanks.
thanks! Another possible issue I spotted is here, where it always assume the --freeze_model is 'crossattn_kv', and if I set this argument to 'crossattn' this line will disregard it.
Ohh yeah. Thanks so much for catching it!! I have corrected it now.