StableCascade
StableCascade copied to clipboard
The results of LoRA are very bad, where did I go wrong?
I have trained LoRA several times with faces but it seems the model can't learn anything.
My dataset is as follow:
data.tar
|- 0000.jpg
|- 0000.txt ("a photo of a woman [ohwx]")
|- 0001.jpg
|- 0001.txt ("a photo of a woman [ohwx]")
...
Here is my config:
experiment_id: stage_c_1b_lora_ohwx
checkpoint_path: output
output_path: output
model_version: 1B
dtype: bfloat16
# WandB
# wandb_project: StableCascade
# wandb_entity: wandb_username
# TRAINING PARAMS
lr: 1.0e-6
batch_size: 1
image_size: 1024
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16]
grad_accum_steps: 1
updates: 2800
backup_every: 20000
save_every: 200
warmup_updates: 1
# use_fsdp: True -> FSDP doesn't work at the moment for LoRA
use_fsdp: False
# GDF
# adaptive_loss_weight: True
# LoRA specific
module_filters: ['.attn']
rank: 256
train_tokens:
- ['[ohwx]', '^woman</w>']
# ema_start_iters: 5000
# ema_iters: 100
# ema_beta: 0.9
webdataset_path: file:data/ohwx.tar
effnet_checkpoint_path: models/effnet_encoder.safetensors
previewer_checkpoint_path: models/previewer.safetensors
generator_checkpoint_path: models/stage_c_lite_bf16.safetensors```
Are you using a single card for training? , when I use a single card for training, I get an error about insufficient storage space. Even though I still have a lot of storage space.
@wen020 yes, I use 4090 to train, may the training requires at least more than 24Gb I think, otherwise it causes oom.
The config of mine works on 4090, just switch to small model like 1B. The only problem I have is the lora result, it is bad.
I used this data(https://huggingface.co/dome272/stable-cascade/blob/main/fernando.tar) to finetuning the lora,i find the result is ok(the results are below). I immediately started training on my own data.
prompt: cinematic photo of a dog [fernando] wearing a space suit
The results of LoRA are very bad with my custom dataset
@wen020 yes, especially on faces :(((
Hey, we never tried out the 1B model on LoRAs. We just used the 3.6B and I could only give feedback on this. The 1B model is very undertrained.
@dome272 Actually the 3B model trained on LoRAs is still bad, especially on faces. Have you ever tried to train faces with LoRA?
@quocanh34 I also has trained the lora by 4090. how long do your train 40000 steps? I will cost 3 hour.
The result training style lora is also bad.
@quocanh34 how to train a lora in 3.6B model with 4090(24Gb)?
@wen020 I can only train 1B model on 4090, otherwise it will cause oom.