diffusers
diffusers copied to clipboard
apply amp bf16 on textual inversion
Hi, @patrickvonplaten @patil-suraj . I tried to use mixed_precision to accelerate finetune but found that the time costs are the same between fp32 and bf16. The accelerator only works for text encoder but the unet model takes a mount of time, so I use amp bf16 to optimize unet. The performance is remarkable(around 1.55x speed-up) and the generated images are also acceptable. If you are interested with this idea, we can discuss further about this patch.
The documentation is not available anymore as the PR was closed or merged.
@patil-suraj
Ideally I'd really like to have accelerate handle all of this. Currently I don't fully understand why we cannot have accelerate change the precision. E.g. the following works:
from diffusers import AutoencoderKL
from accelerate import Accelerator
import torch
print(torch.cuda.is_available())
accelerator = Accelerator()
model = AutoencoderKL()
for param in model.parameters():
param.requires_grad = False
accelerator.prepare(model)
print(model.device)
@patil-suraj @patrickvonplaten Thanks for your comments! The second approach is great and I have applied it in this PR. Would you please help me to review it? Thanks!
@kding1 this is the PR we mentioned, to upstream BF16 common practices to textual inversion, which will benefit Intel 4th generation Xeon platform also.