CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

Error when converting Zero-3 checkpoint: PytorchStreamReader failed reading zip archive: not a ZIP archive

Open 12344143213 opened this issue 9 months ago • 1 comments

Body: Hello! I encountered an error while trying to convert my Zero-3 fine-tuned checkpoint using the zero_to_fp32.py script. The error message is:

PytorchStreamReader failed reading zip archive: not a ZIP archive Steps to Reproduce:

1.Fine-tuned a model with DeepSpeed ZeRO-3 (config attached). 2.Generated checkpoint files in the checkpoint-40/pytorch_model/ directory. 3.Ran the conversion script: bash python zero_to_fp32.py ./checkpoint-40/ ./output/ --safe_serialization
Received the error about the ZIP archive format.


(CogVideo-main) dell@dell-DSS8440:~/hzy/my_video/checkpoint-40$ python zero_to_fp32.py \

"/home/dell/hzy/my_video/checkpoint-40/" \
"/home/dell/hzy/my_video/output/" \
--tag "pytorch_model" \
--safe_serialization

[2025-03-18 15:42:32,765] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) Processing zero checkpoint '/home/dell/hzy/my_video/checkpoint-40/pytorch_model' Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 755, in convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir, File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 632, in convert_zero_checkpoint_to_fp32_state_dict state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 591, in get_fp32_state_dict_from_zero_checkpoint state_dict = _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 199, in _get_fp32_state_dict_from_zero_checkpoint zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 152, in parse_optim_states state_dict = torch.load(f, map_location=device, mmap=True, weights_only=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 1326, in load with _open_zipfile_reader(opened_file) as opened_zipfile: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dell/anaconda3/envs/CogVideo-main/lib/python3.11/site-packages/torch/serialization.py", line 671, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: PytorchStreamReader failed reading zip archive: not a ZIP archive

12344143213 avatar Mar 18 '25 07:03 12344143213

This seems a bit odd, I don't know why it would try to read a ZIP archive (and your parameters appear to be fine), I suggest you check the deepspeed repo to see if there are any related issues.

OleehyO avatar Mar 21 '25 02:03 OleehyO