LlamaGen
LlamaGen copied to clipboard
VQ-VAE ckpt optimizer states?
Hello! Thank you for the clean + user friendly codebase!
I'm trying to finetune the VQ-VAE tokenizer and noticed some keys might be missing from the pretrained checkpoint listed on huggingface: "optimizer"
, "discriminator"
, and "optimizer_disc"
. See here:
command:
torchrun --nnodes=1 --nproc_per_node=1 -m tokenizer.tokenizer_image.vq_train --finetune --disc-start 0 --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --dataset imagenet --data-path /home/julian/images --cloud-save-path ./training-save-dir --global-batch-size 8
output:
| distributed init (rank 0): env://
[2024-06-13 09:12:10] Experiment directory created at results_tokenizer_image/000-VQ-16
[2024-06-13 09:12:10] Experiment directory created in cloud at ./training-save-dir/2024-06-13-09-12-10/000-VQ-16/checkpoints
[2024-06-13 09:12:10] Namespace(data_path='/home/julian/images', data_face_path=None, cloud_save_path='./training-save-dir', no_local_save=False, vq_model='VQ-16', vq_ckpt='./pretrained_models/vq_ds16_c2i.pt', finetune=True, ema=False, codebook_size=16384, codebook_embed_dim=8, codebook_l2_norm=True, codebook_weight=1.0, entropy_loss_ratio=0.0, commit_loss_beta=0.25, reconstruction_weight=1.0, reconstruction_loss='l2', perceptual_weight=1.0, disc_weight=0.5, disc_start=0, disc_type='patchgan', disc_loss='hinge', gen_loss='hinge', compile=False, dropout_p=0.0, results_dir='results_tokenizer_image', dataset='imagenet', image_size=256, epochs=40, lr=0.0001, weight_decay=0.05, beta1=0.9, beta2=0.95, max_grad_norm=1.0, global_batch_size=8, global_seed=0, num_workers=16, log_every=100, ckpt_every=5000, gradient_accumulation_steps=1, mixed_precision='bf16', rank=0, world_size=1, gpu=0, dist_url='env://', distributed=True, dist_backend='nccl')
[2024-06-13 09:12:10] Starting rank=0, seed=0, world_size=1.
[2024-06-13 09:12:12] VQ Model Parameters: 71,883,403
loaded pretrained LPIPS loss from /home/julian/LlamaGen/tokenizer/tokenizer_image/cache/vgg.pth
[2024-06-13 09:12:22] Discriminator Parameters: 2,765,633
[2024-06-13 09:12:32] Dataset contains 691,040 images (/home/julian/images)
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/home/julian/LlamaGen/tokenizer/tokenizer_image/vq_train.py", line 316, in <module>
[rank0]: main(args)
[rank0]: File "/home/julian/LlamaGen/tokenizer/tokenizer_image/vq_train.py", line 146, in main
[rank0]: optimizer.load_state_dict(checkpoint["optimizer"])
[rank0]: KeyError: 'optimizer'
Should the huggingface ckpts be updated to include these?
Thanks again