llama.cpp
llama.cpp copied to clipboard
convert-pth-to-ggml.py failed with RuntimeError
Hi there, I downloaded my LLaMa weights through bit-torrent, and tried to convert the 7B model to ggml FP16 format:
$python convert-pth-to-ggml.py models/7B/ 1
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': 32000}
n_parts = 1
Processing part 0
Traceback (most recent call last):
File "/Users/fzxu/Documents/code/llama.cpp/convert-pth-to-ggml.py", line 89, in <module>
model = torch.load(fname_model, map_location="cpu")
File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 712, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 1049, in _load
result = unpickler.load()
File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 1019, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 997, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: PytorchStreamReader failed reading file data/27: invalid header or archive is corrupted
Does this mean my downloaded version of model weights is corrupted? Or am I missing something? I have filed request to Meta and hopefully I can try again with data from official download source.
what is "data/27" file, that is within your models/7B folder? you downloaded the wrong thing
Here's the file structure of my downloaded model:
$ ls ./models
7B tokenizer.model tokenizer_checklist.chk
$ ls ./models/7B
checklist.chk consolidated.00.pth params.json
There isn't a directory called data
and this looks normal to me.
As for the data/27
file, it seems to be some file structure within the pth
file which seems to be zipped (making some guess by checking the pytorch serialization code: https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L1112)
@KevinXuxuxu Can you post the hashes of the downloaded files?
on Linux:
sha256sum ./models/7B/*
on macOS:
shasum -a 256 ./models/7B/*
My hashes are:
7935c843a25ae265d60bf4543b90bfd91c4911b728412b5c1d5cff42a3cd5645 ./models/7B/checklist.chk
700df0d3013b703a806d2ae7f1bfb8e59814e3d06ae78be0c66368a50059f33d ./models/7B/consolidated.00.pth
7e89e242ddc0dd6f060b43ca219ce8b3e8f08959a72cb3c0855df8bb04d46265 ./models/7B/params.json
@prusnak Thanks for providing the shasum for my validation!
$ shasum -a 256 ./models/7B/*
7935c843a25ae265d60bf4543b90bfd91c4911b728412b5c1d5cff42a3cd5645 ./models/7B/checklist.chk
008cfbd68936367b15a311494c8c8259c4902dbb461896ae767084372cdfa3fc ./models/7B/consolidated.00.pth
7e89e242ddc0dd6f060b43ca219ce8b3e8f08959a72cb3c0855df8bb04d46265 ./models/7B/params.json
Indeed my consolidated.00.pth
file is somewhat corrupted. May I ask how you get the data? From official Meta download or bit-torrent?
Closing this comment while I try to get a correct version of the model weights.
@prusnak Can you provide hashes for the 13B files?
For anyone who has doubt about their data, try using https://github.com/cocktailpeanut/dalai which has the weights downloaded for you, and they seem to come from reliable source.
Here's the file structure of my downloaded model:
$ ls ./models 7B tokenizer.model tokenizer_checklist.chk $ ls ./models/7B checklist.chk consolidated.00.pth params.json
There isn't a directory called
data
and this looks normal to me. As for thedata/27
file, it seems to be some file structure within thepth
file which seems to be zipped (making some guess by checking the pytorch serialization code: https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L1112)
Can you please provide a link to download the LLaMA files