diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Tencent Hunyuan Team: add HunyuanDiT related updates

Open gnobitab opened this issue 9 months ago • 2 comments

This PR did the following things:

  1. Created HunyuanDiTPipeline in src/diffusers/pipelines/hunyuandit/ and HunyuanDiT2DModel in ./src/diffusers/models/transformers/.
  2. To support HunyuanDiT2DModel, added HunyuanDiTBlock and helper functions in src/diffusers/models/attention.py .
  3. Uploaded the safetensors model to my huggingface: XCLiu/HunyuanDiT-0523
  4. Tested the output of the migrated model+code is the same as our repo (https://github.com/Tencent/HunyuanDiT). Have tested different resolutions and batch sizes > 1 and made sure they work correctly.

In this branch, you can run HunyuanDiT in FP32 with:

python3 test_hunyuan_dit.py

which includes the following codes:

import torch
from diffusers import HunyuanDiTPipeline

pipe = HunyuanDiTPipeline.from_pretrained("XCLiu/HunyuanDiT-0523", torch_dtype=torch.float32)
pipe.to('cuda')

### NOTE: HunyuanDiT supports both Chinese and English inputs
prompt = "一个宇航员在骑马"
#prompt = "An astronaut riding a horse"
image = pipe(height=1024, width=1024, prompt=prompt).images[0]

image.save("./img.png")

Dependency: maybe the timm package

TODO lists:

  1. FP16 support: I didn't change the parameter use_fp16 in HunyuanDiTPipeline.__call__(). The reason is BertModel does not support FP16 quantization. In our repo we only quantize the diffusion transformer to FP16. I guess there must be some smart way to support FP16.
  2. Simplify and refactor the HunyuanDiTBlock related codes in src/diffusers/pipelines/hunyuandit/pipeline_hunyuandit.py.
  3. Refactor the pipeline and HunyuanDiT2DModel to diffusers style.
  4. doc

Thank you so much! I'll be there and help with everything.

cc: @sayakpaul @yiyixuxu

gnobitab avatar May 23 '24 09:05 gnobitab