ai-toolkit
ai-toolkit copied to clipboard
Hidream Lora Inference Not working as expected
I trained lora on hidream, for which the validation outputs were good, but inference is not looking good.
I am using the following code:
import os
import yaml
import argparse
import torch
from optimum.quanto import freeze, qfloat8, quantize
import torch
from transformers import PreTrainedTokenizerFast, LlamaForCausalLM
from diffusers import UniPCMultistepScheduler, HiDreamImagePipeline
def main(prompt_file: str, lora_path: str, char_prompt: str, output_dir: str):
os.makedirs(output_dir, exist_ok=True)
# Load prompts
with open(prompt_file, "r", encoding="utf-8") as f:
prompts = [line.strip() for line in f if line.strip()]
tokenizer_4 = PreTrainedTokenizerFast.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
text_encoder_4 = LlamaForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
output_hidden_states=True,
output_attentions=True,
torch_dtype=torch.float16, # <-- change here
)
pipe = HiDreamImagePipeline.from_pretrained(
"HiDream-ai/HiDream-I1-Full",
tokenizer_4=tokenizer_4,
text_encoder_4=text_encoder_4,
torch_dtype=torch.float16,
).to("cuda", torch.bfloat16)
pipe.load_lora_weights(lora_path)
pipe.fuse_lora(lora_scale=1)
pipe.to("cuda")
quantize(pipe.transformer, weights=qfloat8)
freeze(pipe.transformer)
# Generate and save images
for idx, prompt in enumerate(prompts):
prompt = f"{char_prompt} {prompt}"
print(f"PROMPT:{prompt}")
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=4.5,
num_inference_steps=50,
).images[0]
image_path = os.path.join(output_dir, f"image_{idx+1}.png")
image.save(image_path)
print(f"Saved: {image_path}")
Validation outputs
Inference outputs
It seems to me that load_lora_weights is not available for hidream in the diffusers implementation. But is there another way, I can get these inferences?
@omrastogi how did you train the lora for hidream?
Hi @joeyism
I am sharing the configuration that I used to train these images.
job: extension
config:
name: VoxStyle_Hidream
process:
- type: sd_trainer
training_folder: output/
performance_log_every: 1000
device: cuda:0
network:
type: lora
linear: 64
linear_alpha: 64
save:
dtype: bfloat16
save_every: 1000
max_step_saves_to_keep: 1
push_to_hub: false
datasets:
- folder_path: /mnt/data/om/lora_dataset/VoxMachina
caption_ext: txt
caption_dropout_rate: 0.0
shuffle_tokens: false
cache_latents_to_disk: true
resolution:
- 512
- 768
- 1024
train:
batch_size: 1
steps: 5000
gradient_accumulation_steps: 1
train_unet: true
train_text_encoder: false
gradient_checkpointing: true
noise_scheduler: flowmatch
timestep_type: shift
optimizer: adamw8bit
lr: 1e-5
ema_config:
use_ema: true
ema_decay: 0.99
dtype: bf16
model:
name_or_path: HiDream-ai/HiDream-I1-Full
extras_name_or_path: "HiDream-ai/HiDream-I1-Full"
arch: "hidream"
quantize: true
quantize_te: true
model_kwargs:
llama_model_path: "unsloth/Meta-Llama-3.1-8B-Instruct"
sample:
sampler: flowmatch
sample_every: 100
width: 1024
height: 1024
prompts:
- In voxStyle, a western-anime fusion with cel-shaded, dull lighting, expressive characters, detailed 2D backgrounds and mature fantasy tone, A young man with short light brown hair and a serious expression. He is wearing a dark coat with a white shirt and a gray tie. The background is dark with green and gold swirls. The lighting is soft and diffused, creating a gentle glow on his face. The man is centered in the image, with the background slightly out of focus.
- In voxStyle, a western-anime fusion with cel-shaded, dull lighting, expressive characters, detailed 2D backgrounds and mature fantasy tone, An elf woman with platinum blonde hair, pointed ears, and a white and gold off-shoulder dress is seated at a formal dining table. She is turned slightly to her right, covering her mouth with one hand as if whispering or reacting discreetly. The table is set with formal cutlery and a folded napkin on a plate.
- In voxStyle, a western-anime fusion with cel-shaded, dull lighting, expressive characters, detailed 2D backgrounds and mature fantasy tone, A woman with short black hair and a white fur tail, wearing a blue and black outfit with a brown belt. She has pointed ears and a confident expression. She is standing in a dimly lit room with a dark curtain in the background. The lighting is soft and warm, casting gentle shadows. The woman has a slender physique and is gesturing with her right hand, as if making a gesture.
- In voxStyle, a western-anime fusion with cel-shaded, dull lighting, expressive characters, detailed 2D backgrounds and mature fantasy tone, Two young women peeking out from behind a dark curtain. The woman on the left has light skin, brown hair, and a white bandage wrapped around her head. She has a small smile and is looking directly at the viewer. The other woman has dark skin and brown hair. Both women have large, expressive eyes. The background is simple and dark, with vertical black stripes. The lighting is soft and diffused, casting gentle shadows.
neg: ''
seed: 42
walk_seed: true
guidance_scale: 4
sample_steps: 25
meta:
name: '[name]'
version: '1.0'
@joeyism, any ideas how to infer the LORA weights?