Error when converting Zero-3 checkpoint: PytorchStreamReader failed reading zip archive: not a ZIP archive
Body: Hello! I encountered an error while trying to convert my Zero-3 fine-tuned checkpoint using the zero_to_fp32.py script. The error message is:
PytorchStreamReader failed reading zip archive: not a ZIP archive Steps to Reproduce:
1.Fine-tuned a model with DeepSpeed ZeRO-3 (config attached).
2.Generated checkpoint files in the checkpoint-40/pytorch_model/ directory.
3.Ran the conversion script:
bash
python zero_to_fp32.py ./checkpoint-40/ ./output/ --safe_serialization
Received the error about the ZIP archive format.
(CogVideo-main) dell@dell-DSS8440:~/hzy/my_video/checkpoint-40$ python zero_to_fp32.py \
"/home/dell/hzy/my_video/checkpoint-40/" \ "/home/dell/hzy/my_video/output/" \ --tag "pytorch_model" \ --safe_serialization
[2025-03-18 15:42:32,765] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint '/home/dell/hzy/my_video/checkpoint-40/pytorch_model'
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/dell/hzy/my_video/checkpoint-40/zero_to_fp32.py", line 755, in
This seems a bit odd, I don't know why it would try to read a ZIP archive (and your parameters appear to be fine), I suggest you check the deepspeed repo to see if there are any related issues.