galai icon indicating copy to clipboard operation
galai copied to clipboard

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Open sebasmos opened this issue 3 years ago • 4 comments

Hi, installed and imported galai successfully using Ubuntu 21.01. I installed galai as :

conda create -n galia python=3.9 conda activate galia pip install git+https://github.com/paperswithcode/galai

and the code I tested is:

import galai as gal model = gal.load_model(name = 'mini', num_gpus = 1) model.generate("Lecture 1: The Ising Model\n\n", new_doc=True, top_p=0.7, max_length=200)

however, the mini obtains the following error:

Traceback (most recent call last): File "mini.py", line 3, in model = gal.load_model("standard") File "/home/sebasmos/Desktop/AnpassenNN//galia/galai/galai/init.py", line 41, in load_model model._load_checkpoint(checkpoint_path=get_checkpoint_path(name)) File "/home/sebasmos/Desktop/AnpassenNN//galia/galai/galai/model.py", line 69, in _load_checkpoint offload_state_dict=True File "/home/sebasmos/anaconda3/envs/yolo/lib/python3.7/site-packages/accelerate/big_modeling.py", line 372, in load_checkpoint_and_dispatch offload_state_dict=offload_state_dict, File "/home/sebasmos/anaconda3/envs/yolo/lib/python3.7/site-packages/accelerate/utils/modeling.py", line 679, in load_checkpoint_in_model checkpoint = torch.load(checkpoint_file) File "/home/sebasmos/anaconda3/envs/yolo/lib/python3.7/site-packages/torch/serialization.py", line 705, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/home/sebasmos/anaconda3/envs/yolo/lib/python3.7/site-packages/torch/serialization.py", line 243, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

sebasmos avatar Nov 16 '22 16:11 sebasmos

Same problem here. In my case the error occurs at on: model = gal.load_model("standard")

dionator avatar Nov 17 '22 08:11 dionator

You should check under : ~/.cache/galactica/standard.pt/ The downloaded zipped checkpoint must have been corrupt. Just delete it and try again.

lsiksous avatar Nov 17 '22 13:11 lsiksous

@lsiksous thanks for the hint. any idea where the cache folder would be on Windows?

dionator avatar Nov 17 '22 17:11 dionator

@lsiksous, you're onto something. I forced a download of a different model size (changed "standard" to "mini") and it works. Running the code with all other model sizes works as well. This would indeed seem to suggest the originally download model is the problem (standard in my case since I just copied the sample code from the repo's README)

dionator avatar Nov 17 '22 19:11 dionator

Hi all, in galai 1.1.0 we switched to transformers for checkpoints management. See the details at https://huggingface.co/docs/transformers/installation#cache-setup for information about where the cache is located and how to change it. Closing this for now as it seems to be due to file corruption, as mentioned above. Please reopen if you still have any issues.

mkardas avatar Dec 09 '22 10:12 mkardas