diffusers Tencent Hunyuan Team: add HunyuanDiT related updates

Tencent Hunyuan Team: add HunyuanDiT related updates

Open gnobitab opened this issue 9 months ago • 2 comments

This PR did the following things:

Created HunyuanDiTPipeline in src/diffusers/pipelines/hunyuandit/ and HunyuanDiT2DModel in ./src/diffusers/models/transformers/.
To support HunyuanDiT2DModel, added HunyuanDiTBlock and helper functions in src/diffusers/models/attention.py .
Uploaded the safetensors model to my huggingface: XCLiu/HunyuanDiT-0523
Tested the output of the migrated model+code is the same as our repo (https://github.com/Tencent/HunyuanDiT). Have tested different resolutions and batch sizes > 1 and made sure they work correctly.

In this branch, you can run HunyuanDiT in FP32 with:

python3 test_hunyuan_dit.py

which includes the following codes:

import torch
from diffusers import HunyuanDiTPipeline

pipe = HunyuanDiTPipeline.from_pretrained("XCLiu/HunyuanDiT-0523", torch_dtype=torch.float32)
pipe.to('cuda')

### NOTE: HunyuanDiT supports both Chinese and English inputs
prompt = "一个宇航员在骑马"
#prompt = "An astronaut riding a horse"
image = pipe(height=1024, width=1024, prompt=prompt).images[0]

image.save("./img.png")

Dependency: maybe the timm package

TODO lists:

FP16 support: I didn't change the parameter use_fp16 in HunyuanDiTPipeline.__call__(). The reason is BertModel does not support FP16 quantization. In our repo we only quantize the diffusion transformer to FP16. I guess there must be some smart way to support FP16.
Simplify and refactor the HunyuanDiTBlock related codes in src/diffusers/pipelines/hunyuandit/pipeline_hunyuandit.py.
Refactor the pipeline and HunyuanDiT2DModel to diffusers style.
doc

Thank you so much! I'll be there and help with everything.

cc: @sayakpaul @yiyixuxu

May 23 '24 09:05 gnobitab

diffusers diffusers copied to clipboard

Tencent Hunyuan Team: add HunyuanDiT related updates

diffusers
diffusers copied to clipboard