Ethan Smith
Ethan Smith
im finding some of my processes are able to load the model states while others fail to do so edit: in my case i realized i am going from one...
I think the error suggests vanishing gradient, but it's strange that I don't see it when using fp16 or full precision
I did see some comments about how num_proc=None could help and outputting numpy arrays can also help in the docs, but this seems quite odd now dropping down to 1it/s...
is there any reason I would see this if training a single model? And only occuring with fp16, bf16 and fp32 do not result in this error
hey @artykov1511 see UnCLIPPipeline in diffusers which uses the same methodology of projection onto timestep embeddings and extra context tokens :)
@muellerzr Thank you
the functions they use search through all named attn layers of the model and make the modifications as needed for self attn and cross, so I should think that shouldn't...
Nevermind, got it working! I didn't realize that the prompt that goes into the text encoder has to be the new one. I'll be trying your repo as well afterwards
@liujianzhi @dbolya I am having a similar issue running on an a100. my baseline time is 25it/s on float16 at 50% ratio my time only gets up to about 25.5it/s...
Thank you daniel, makes sense!