Open-Sora-Plan icon indicating copy to clipboard operation
Open-Sora-Plan copied to clipboard

Some question about resume training causalvae

Open ZhikangNiu opened this issue 10 months ago • 1 comments

When I want to resume train causalvae in our own dataset with following script, it always report the following bug

_pickle.UnpicklingError: invalid load key, '\xbb'.

here are my script

python opensora/train/train_causalvae.py \
    --exp_name "ucf" \
    --batch_size 1 \
    --precision bf16 \
    --max_steps 40000 \
    --save_steps 100 \
    --output_dir results/causalvae_ \
    --video_path /home/v-zhikangniu/Open-Sora-Plan/data/MSRVTT \
    --video_num_frames 17 \
    --resolution 256 \
    --sample_rate 1 \
    --n_nodes 1 \
    --devices 1 \
    --num_workers 8 \
    --model_config scripts/causalvae/release.json \
    --resume_from_checkpoint /home/v-zhikangniu/Open-Sora-Plan/checkpoint_v1/17x256x256/diffusion_pytorch_model.safetensors

I'm sure the code is latest

ZhikangNiu avatar Apr 12 '24 09:04 ZhikangNiu

The resume_from_checkpoint parameter should only be filled with the path to the checkpoint file XXXX.ckpt output by PyTorch Lightning. However, what you might actually want is the load_from_checkpoint parameter, which requires the directory path of the config.json file and the model file (which can be in either the HF format or the PL format).

qqingzheng avatar Apr 12 '24 09:04 qqingzheng