stanford_alpaca icon indicating copy to clipboard operation
stanford_alpaca copied to clipboard

While weight conversion of llama-13b getting this error: RuntimeError: Internal: unk is not defined.

Open Ahtesham00 opened this issue 1 year ago • 1 comments

I am using linux ubunto.

with following virtual environment

accelerate==0.18.0 certifi==2022.12.7 charset-normalizer==3.1.0 cmake==3.26.3 filelock==3.12.0 huggingface-hub==0.13.4 idna==3.4 Jinja2==3.1.2 lit==16.0.1 MarkupSafe==2.1.2 mpmath==1.3.0 networkx==3.1 numpy==1.24.2 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 packaging==23.1 psutil==5.9.5 PyYAML==6.0 regex==2023.3.23 requests==2.28.2 sentencepiece==0.1.98 sympy==1.11.1 tokenizers==0.13.3 torch==2.0.0 tqdm==4.65.0 transformers==4.28.1 triton==2.0.0 typing_extensions==4.5.0 urllib3==1.26.15

It should generated the converted weights. But instead it is generating this error

Loading the checkpoint in a Llama model. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:17<00:00, 2.35it/s] Saving in the Transformers format. Saving a LlamaTokenizerFast to /home/test-converted. Traceback (most recent call last): File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 278, in main() File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 274, in main write_tokenizer(args.output_dir, spm_path) File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 248, in write_tokenizer tokenizer = tokenizer_class(input_tokenizer_path) File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 89, in init super().init( File "/home/myenv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 117, in init slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs) File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init self.sp_model.Load(vocab_file) File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load return self.LoadFromFile(model_file) File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: unk is not defined.

Ahtesham00 avatar Apr 19 '23 19:04 Ahtesham00

I am using linux ubunto.

with following virtual environment

accelerate==0.18.0 certifi==2022.12.7 charset-normalizer==3.1.0 cmake==3.26.3 filelock==3.12.0 huggingface-hub==0.13.4 idna==3.4 Jinja2==3.1.2 lit==16.0.1 MarkupSafe==2.1.2 mpmath==1.3.0 networkx==3.1 numpy==1.24.2 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 packaging==23.1 psutil==5.9.5 PyYAML==6.0 regex==2023.3.23 requests==2.28.2 sentencepiece==0.1.98 sympy==1.11.1 tokenizers==0.13.3 torch==2.0.0 tqdm==4.65.0 transformers==4.28.1 triton==2.0.0 typing_extensions==4.5.0 urllib3==1.26.15

It should generated the converted weights. But instead it is generating this error

Loading the checkpoint in a Llama model. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:17<00:00, 2.35it/s] Saving in the Transformers format. Saving a LlamaTokenizerFast to /home/test-converted. Traceback (most recent call last): File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 278, in main() File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 274, in main write_tokenizer(args.output_dir, spm_path) File "/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 248, in write_tokenizer tokenizer = tokenizer_class(input_tokenizer_path) File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 89, in init super().init( File "/home/myenv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 117, in init slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs) File "/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init self.sp_model.Load(vocab_file) File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load return self.LoadFromFile(model_file) File "/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: unk is not defined.

have you solved it ? I met the same error

merlinarer avatar May 28 '23 07:05 merlinarer