diffusers
diffusers copied to clipboard
Tencent Hunyuan Team: add HunyuanDiT related updates
This PR did the following things:
- Created
HunyuanDiTPipeline
insrc/diffusers/pipelines/hunyuandit/
andHunyuanDiT2DModel
in./src/diffusers/models/transformers/
. - To support
HunyuanDiT2DModel
, addedHunyuanDiTBlock
and helper functions insrc/diffusers/models/attention.py
. - Uploaded the safetensors model to my huggingface:
XCLiu/HunyuanDiT-0523
- Tested the output of the migrated model+code is the same as our repo (https://github.com/Tencent/HunyuanDiT). Have tested different resolutions and batch sizes > 1 and made sure they work correctly.
In this branch, you can run HunyuanDiT in FP32 with:
python3 test_hunyuan_dit.py
which includes the following codes:
import torch
from diffusers import HunyuanDiTPipeline
pipe = HunyuanDiTPipeline.from_pretrained("XCLiu/HunyuanDiT-0523", torch_dtype=torch.float32)
pipe.to('cuda')
### NOTE: HunyuanDiT supports both Chinese and English inputs
prompt = "一个宇航员在骑马"
#prompt = "An astronaut riding a horse"
image = pipe(height=1024, width=1024, prompt=prompt).images[0]
image.save("./img.png")
Dependency:
maybe the timm
package
TODO lists:
- FP16 support: I didn't change the parameter
use_fp16
inHunyuanDiTPipeline.__call__()
. The reason isBertModel
does not support FP16 quantization. In our repo we only quantize the diffusion transformer to FP16. I guess there must be some smart way to support FP16. - Simplify and refactor the
HunyuanDiTBlock
related codes insrc/diffusers/pipelines/hunyuandit/pipeline_hunyuandit.py
. - Refactor the pipeline and HunyuanDiT2DModel to diffusers style.
- doc
Thank you so much! I'll be there and help with everything.
cc: @sayakpaul @yiyixuxu