DiffSynth-Studio icon indicating copy to clipboard operation
DiffSynth-Studio copied to clipboard

KeyError: 'pytorch-lightning_version'

Open suimuc opened this issue 11 months ago • 3 comments

When I resum from the previous checkpoint, the code error message is "KeyError: 'pytorch-lightning_version'", I just add "ckpt_path="/home/shuchenweng/zhj/project/DiffSynth-Studio/wan_training/lightning_logs/version_0/checkpoints/epoch=0-step=5000.ckpt" to the code trainer.fit(model, dataloader, ckpt_path="/home/shuchenweng/zhj/project/DiffSynth-Studio/wan_training/lightning_logs/version_0/checkpoints/epoch=0-step=5000.ckpt")

suimuc avatar Mar 04 '25 13:03 suimuc

@suimuc Please use --pretrained_lora_path xxx in your terminal.

Artiprocher avatar Mar 04 '25 13:03 Artiprocher

@suimuc Please use --pretrained_lora_path xxx in your terminal.

These is no such option

Image

suimuc avatar Mar 05 '25 05:03 suimuc

This is my script: CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python examples/wanvideo/train_wan_t2v_ours_audio.py --task train --train_architecture full --dataset_path data/example_dataset --output_path ./wan_training --dit_path "/home/shuchenweng/zhj/model-weight/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors" --max_epochs 10 --learning_rate 2e-5 --training_strategy deepspeed_stage_2 --use_gradient_checkpointing

It can traning from pretrained succueefully:

Image

However when I want to resume from the previous broken training checkpoint, it failed, I only add "ckpt_path="/home/shuchenweng/zhj/project/DiffSynth-Studio/wan_training/lightning_logs/version_0/checkpoints/epoch=0-step=5000.ckpt" to the code:

Image

I use the deepspeed stage-2 training_strategy

suimuc avatar Mar 05 '25 05:03 suimuc