StableCascade icon indicating copy to clipboard operation
StableCascade copied to clipboard

The results of LoRA are very bad, where did I go wrong?

Open apluka34 opened this issue 1 year ago • 12 comments

I have trained LoRA several times with faces but it seems the model can't learn anything.

My dataset is as follow:

data.tar 
|- 0000.jpg      
|- 0000.txt ("a photo of a woman [ohwx]")
|- 0001.jpg      
|- 0001.txt ("a photo of a woman [ohwx]")
...

Here is my config:

experiment_id: stage_c_1b_lora_ohwx
checkpoint_path: output
output_path: output
model_version: 1B
dtype: bfloat16

# WandB
# wandb_project: StableCascade
# wandb_entity: wandb_username

# TRAINING PARAMS
lr: 1.0e-6
batch_size: 1
image_size: 1024
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16]
grad_accum_steps: 1
updates: 2800
backup_every: 20000
save_every: 200
warmup_updates: 1
# use_fsdp: True -> FSDP doesn't work at the moment for LoRA
use_fsdp: False

# GDF
# adaptive_loss_weight: True

# LoRA specific
module_filters: ['.attn']
rank: 256
train_tokens:
  - ['[ohwx]', '^woman</w>']

# ema_start_iters: 5000
# ema_iters: 100
# ema_beta: 0.9

webdataset_path: file:data/ohwx.tar
effnet_checkpoint_path: models/effnet_encoder.safetensors
previewer_checkpoint_path: models/previewer.safetensors
generator_checkpoint_path: models/stage_c_lite_bf16.safetensors```

apluka34 avatar Feb 18 '24 00:02 apluka34

Are you using a single card for training? , when I use a single card for training, I get an error about insufficient storage space. Even though I still have a lot of storage space.

wen020 avatar Feb 25 '24 12:02 wen020

@wen020 yes, I use 4090 to train, may the training requires at least more than 24Gb I think, otherwise it causes oom.

apluka34 avatar Feb 25 '24 14:02 apluka34

The config of mine works on 4090, just switch to small model like 1B. The only problem I have is the lora result, it is bad.

apluka34 avatar Feb 25 '24 14:02 apluka34

I used this data(https://huggingface.co/dome272/stable-cascade/blob/main/fernando.tar) to finetuning the lora,i find the result is ok(the results are below). I immediately started training on my own data. prompt: cinematic photo of a dog [fernando] wearing a space suit dog

wen020 avatar Feb 26 '24 10:02 wen020

The results of LoRA are very bad with my custom dataset

wen020 avatar Feb 27 '24 09:02 wen020

@wen020 yes, especially on faces :(((

apluka34 avatar Feb 27 '24 10:02 apluka34

Hey, we never tried out the 1B model on LoRAs. We just used the 3.6B and I could only give feedback on this. The 1B model is very undertrained.

dome272 avatar Feb 27 '24 11:02 dome272

@dome272 Actually the 3B model trained on LoRAs is still bad, especially on faces. Have you ever tried to train faces with LoRA?

apluka34 avatar Feb 27 '24 14:02 apluka34

@quocanh34 I also has trained the lora by 4090. how long do your train 40000 steps? I will cost 3 hour.

wen020 avatar Feb 27 '24 15:02 wen020

The result training style lora is also bad.

wen020 avatar Feb 27 '24 15:02 wen020

@quocanh34 how to train a lora in 3.6B model with 4090(24Gb)?

wen020 avatar Mar 04 '24 09:03 wen020

@wen020 I can only train 1B model on 4090, otherwise it will cause oom.

apluka34 avatar Mar 04 '24 19:03 apluka34