diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

apply amp bf16 on textual inversion

Open jiqing-feng opened this issue 2 years ago • 1 comments

Hi, @patrickvonplaten @patil-suraj . I tried to use mixed_precision to accelerate finetune but found that the time costs are the same between fp32 and bf16. The accelerator only works for text encoder but the unet model takes a mount of time, so I use amp bf16 to optimize unet. The performance is remarkable(around 1.55x speed-up) and the generated images are also acceptable. If you are interested with this idea, we can discuss further about this patch.

jiqing-feng avatar Nov 29 '22 08:11 jiqing-feng

The documentation is not available anymore as the PR was closed or merged.

@patil-suraj

Ideally I'd really like to have accelerate handle all of this. Currently I don't fully understand why we cannot have accelerate change the precision. E.g. the following works:

from diffusers import AutoencoderKL
from accelerate import Accelerator
import torch

print(torch.cuda.is_available())

accelerator = Accelerator()

model = AutoencoderKL()

for param in model.parameters():
    param.requires_grad = False

accelerator.prepare(model)

print(model.device)

patrickvonplaten avatar Dec 05 '22 13:12 patrickvonplaten

@patil-suraj @patrickvonplaten Thanks for your comments! The second approach is great and I have applied it in this PR. Would you please help me to review it? Thanks!

jiqing-feng avatar Dec 07 '22 04:12 jiqing-feng

@kding1 this is the PR we mentioned, to upstream BF16 common practices to textual inversion, which will benefit Intel 4th generation Xeon platform also.

yao-matrix avatar Dec 12 '22 07:12 yao-matrix