sparseml
sparseml copied to clipboard
RecursionError when converting LlaMa model to ONNX
Describe the bug
RecursionError: maximum recursion depth exceeded while getting the str of an object
Expected behavior I want to the convert a LlaMa model into ONNX and then benchmark it in deepsparse
Environment
conda create -n sparseml_main python=3.9
conda activate sparseml_main
git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
pip install deepsparse
pip install sentencepiece
To Reproduce
First download the model using
huggingface-cli download baffo32/decapoda-research-llama-7B-hf --local-dir llama-7B
Then convert using
sparseml.export --task text-generation llama-7B
Errors
2024-01-26 16:49:06 sparseml.export.export INFO Starting export for transformers model...
2024-01-26 16:49:06 sparseml.export.export INFO Creating model for the export...
2024-01-26 16:49:06 sparseml.transformers.integration_helper_functions WARNING trust_remote_code is set to False. It is possible, that the model will not be loaded correctly.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Traceback (most recent call last):
File "/home/sliu01/miniconda3/envs/sparseml_main/bin/sparseml.export", line 8, in <module>
sys.exit(main())
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/export/export.py", line 432, in main
export(
File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/export/export.py", line 212, in export
model, loaded_model_kwargs = helper_functions.create_model(
File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/transformers/integration_helper_functions.py", line 101, in create_model
tokenizer = initialize_tokenizer(source_path, sequence_length, task)
File "/gpfs/work2/0/prjs0761/OWL-new/sparseml/src/sparseml/transformers/utils/initializers.py", line 76, in initialize_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 751, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
return cls._from_pretrained(
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 134, in __init__
self.update_post_processor()
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 147, in update_post_processor
bos_token_id = self.bos_token_id
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1172, in bos_token_id
return self.convert_tokens_to_ids(self.bos_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
''''' repeat '''''
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1057, in unk_token
return str(self._unk_token)
RecursionError: maximum recursion depth exceeded while getting the str of an object
Additional context
Also tried to install sparseml using pip install sparseml,
and tried to convert using
sparseml.transformers.export_onnx
But the same error happens.
.
Hey @luuyin it seems this model has issues with it's tokenizer setup, which might have been covered up by installing sentencepiece
. I would recommend trying a llama model that is configured to work well with native transformers
If I simply try to use the model as-is in native transformers, it fails in the same way so this isn't a sparseml specific issue:
from transformers import pipeline
pipe = pipeline("text-generation", model="llama-7B")
prompt = "How many helicopters can a human eat in one sitting?"
outputs = pipe(prompt)
print(outputs[0]["generated_text"])
output:
File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 329, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 336, in _convert_token_to_id_with_added_voc
return self.unk_token_id
File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1191, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
File "/home/mgoin/venvs/test-fail/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1057, in unk_token
return str(self._unk_token)
long repeating
If that sentencepiece
lib is not installed, I get this error:
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
@luuyin I have validated that the flow works as expected with a properly-formed llama 2 model - i used NousResearch/Llama-2-7b-hf
for this
initial setup
python3 -m venv ~/venvs/test-fail
source ~/venvs/test-fail/bin/activate
pip install sentencepiece -e "sparseml[transformers]"
huggingface-cli download baffo32/decapoda-research-llama-7B-hf --local-dir llama-7B
sparseml.export --task text-generation llama-7B
this fails because of the decapoda research model
If I replace that model with this version https://huggingface.co/NousResearch/Llama-2-7b-hf, it works as expected
huggingface-cli download NousResearch/Llama-2-7b-hf --local-dir llama-7B
sparseml.export --task text-generation llama-7B
initial
@luuyin I have validated that the flow works as expected with a properly-formed llama 2 model - i used
NousResearch/Llama-2-7b-hf
for thisinitial setup
python3 -m venv ~/venvs/test-fail source ~/venvs/test-fail/bin/activate pip install sentencepiece -e "sparseml[transformers]" huggingface-cli download baffo32/decapoda-research-llama-7B-hf --local-dir llama-7B sparseml.export --task text-generation llama-7B
this fails because of the decapoda research model
If I replace that model with this version https://huggingface.co/NousResearch/Llama-2-7b-hf, it works as expected
huggingface-cli download NousResearch/Llama-2-7b-hf --local-dir llama-7B sparseml.export --task text-generation llama-7B
Thank you for your prompt response, Michael Goinm. Indeed, implementing the model you suggested seems promising for a successful conversion to an ONNX format, with a minor warning.
Attempting to validate an in-memory ONNX model with size > 2000000000 bytes.
validate_onnxskipped, as large ONNX models cannot be validated in-memory. To validate this model, save it to disk and call
validate_onnx on the file path.
However, in the following step, when I tried to benchmark the converted model using deepsparse.benchmark llama-7B/deployment/model.onnx --sequence_length 2048
I got the following error:
2024-01-26 19:20:34 deepsparse.benchmark.helpers INFO Thread pinning to cores enabled
2024-01-26 19:20:34 deepsparse.benchmark.benchmark_model INFO Found model with KV cache support. Benchmarking the autoregressive model with input_ids_length: 1 and sequence length: 2048.
2024-01-26 19:20:34 deepsparse.benchmark.benchmark_model INFO Benchmarking Engine: deepsparse with internal KV cache management
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.1 COMMUNITY | (eff4f95d) (release) (optimized) (system=avx512_vnni, binary=avx512)
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.1 (eff4f95d) (release) (optimized) (system=avx512_vnni, binary=avx512)
Date: 01-26-2024 @ 19:20:35 CET
OS: Linux gcn17.local.snellius.surf.nl 4.18.0-372.80.1.el8_6.x86_64 #1 SMP Fri Nov 3 14:30:16 EDT 2023
Arch: x86_64
CPU: GenuineIntel
Vendor: Intel
Cores/sockets/threads: [72, 2, 72]
Available cores/sockets/threads: [36, 1, 36]
L1 cache size data/instruction: 48k/32k
L2 cache size: 1.25Mb
L3 cache size: 54Mb
Total memory: 503.518G
Free memory: 387.075G
Thread: 0x14734509b440
Assertion at ./src/include/wand/engine/execution/planner.hpp:121
Backtrace:
0# 0x000014723b0313c3:
[440fb6c34c8b25925a2c026a004489e94c89f641b9010000004c89e7e87cb224]
[02585a84db75084c89e7e8aecbffff4c89e7e876b2240248833d8e552c020074]
1# 0x000014723b033c18:
[e4892402b901000000ba79000000488d352b8895fe488d3d19e58cfee857d7ff]
[ff4889c3c5f877e9edfaffff4889c3e912fbffff4889c3c5f877e9f7faffff48]
2# wand::engine::compiler::compiler::execution_graph_to_linear_order(wand::engine::execution::graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
3# wand::engine::compiler::compiler::compile(wand::engine::execution::graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
4# wand::engine::compiler::compiler::compile(wand::engine::compute::compute_graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
5# wand::engine::compiler::compiler::compile(wand::engine::intake::graph&&) const in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.15.1
6# 0x000014723a1f3c97:
[ff488bbc24c80000004885ff7405e88673ebff4c89ea4889de4c89f7e8e88a08]
[0349c7042400000000bff0000000e8a6390003488b1567a581ff488d4810c5f1]
7# 0x000014723a1fa2e3:
[850801000083400801488dbbb00000004d89e04c89e94c89f24889dee85c97ff]
[ff488b45b84989c64885c0741f488b1d79c60f034885db0f85c00000008b4008]
8# 0x000014723a23bc10:
[5648488b464c53415450524889fa488bbde0fcffff488985f8fcffffe8efe4fb]
[ff488bbd08feffff4883c4204885ff7405e8eaf3e6ff488bbd18fdffff4885ff]
9# 0x000014723a204ee5:
[0000488dbc2490000000488d4c24404c8b44241848897c2420488b33e8ba5a03]
[00488b4308c5f9efc0c5f96f942490000000c5f97f8424900000004889442430]
10# 0x000014723a2062c0:
[4154c4e1f96eca4989fcc4e3f122c6014883ec104889e6c5f97f0424e81feaff]
[ff4883c4104c89e0415cc3cccccccccc488b07c5fa6f06c5fa7f00c5fa6f4e10]
11# 0x000014723aa90a48:
[02e982ba7e02cccc415741564155415455534883ec38488b0648897c2410ff50]
[28488b4424104c8b7008488b184c89f04829d84889c248c1f80548c1fa034885]
12# 0x000014723aa921e2:
[488b742470488dbc24700100004c89f24889c148897c241848890424e84de8ff]
[ff488b4310488b0bc5f9efc0c5f96fb42470010000488b530848894424104989]
13# 0x000014723aa98293:
[488d8550feffff4889c7488985f8fcffff4889c3c5fa7f8538feffffe8dc9cff]
[ff4883bd50feffff000f847803000041b879010000488d0da99ae5fe4889de31]
14# 0x000014723aa9ac0e:
[89f94c8b0bff75c04c89f24c89ee488bbd58ffffffffb548ffffff50e8b1d2ff]
[ff4883c42048837d800074b6488bb558ffffff41b825020000488d0df47fe6fe]
15# 0x000014723a0e0ff4:
[ffc5f97f8590feffff488d4838488d85c0feffff5048898550feffffe89b999b]
[004883bd60ffffff00585a0f849b000000418bbf74090000488bb578feffff41]
16# 0x000014723a0ed768:
[0fb68540fcffff488b9578fcffff4889de4c89ff89c1898560fcffffe89734ff]
[ff4883bdd0fdffff000f84e10000008bbb7409000041b81a0600004c89fe488d]
17# 0x000014723a0abf79:
[836f080175e1ebbf0f1f8000000000488b442430488b7c2438488b30e8c60304]
[00488b44245048894424304885c00f84eafcffff488b7c2438e8d97b9c00e9c7]
18# 0x000014723a0c0284:
[83c4184c89e05b5d415c415dc30f1f800000000031d24c89ee4889efe81bb2fe]
[ff488b7c24084989c44885c075c648893b4883c4184c89e05b5d415c415dc348]
19# deepsparse::ort_engine::init(wand::arch_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::shared_ptr<wand::parallel::scheduler_factory_t>, std::optional<NMExecutionProviderEngineParams> const&, std::optional<NMExecutionProviderBenchmarkParams> const&) in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/deepsparse/avx512/libdeepsparse.so
20# 0x000014723d3a9932:
[89e94d89f88b8d2cffffff4c89f7488b9530ffffff488b352a050d00e8dd690c]
[00488b7da8595e4885ff7405e8fd640c00488b7d804885ff0f8494000000807d]
21# 0x000014723d3aab3e:
[8b4d104c8d4d804d89f06a00488bb5f8feffff4489e24889c74989c5e871ebff]
[ff488b7d88585a4885ff7405e8f1520c004c89ffe8e9570c00e9b5fcffff0f1f]
22# 0x000014723d3ea8fa:
[feffff41574c8b8538feffff4d89f1488b8d58feffff8bb550feffffe825fefb]
[ff488bbd78feffff4989c4585a4885ff7405e82f55080080bd18ffffff007432]
23# 0x000014723d3b5aa7:
[30ffff4c8dac24c0010000488b78404c89eee8e2a50b00498b04244c89e7ff50]
[304989c74c89ef4889442478e8b832ffff4983ff010f85760200004983c4684c]
24# cfunction_call at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:543
25# _PyObject_MakeTpCall at /usr/local/src/conda/python-3.9.18/Objects/call.c:191
26# method_vectorcall at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:116
27# method_vectorcall at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:103
28# method_vectorcall at /usr/local/src/conda/python-3.9.18/Objects/classobject.c:83
29# slot_tp_init at /usr/local/src/conda/python-3.9.18/Objects/typeobject.c:6974
30# type_call at /usr/local/src/conda/python-3.9.18/Objects/typeobject.c:1028
31# pybind11_meta_call in /home/sliu01/miniconda3/envs/sparseml_main/lib/python3.9/site-packages/onnx/onnx_cpp2py_export.cpython-39-x86_64-linux-gnu.so
32# _PyObject_MakeTpCall at /usr/local/src/conda/python-3.9.18/Objects/call.c:191
Please email a copy of this stack trace and any additional information to: [email protected]
Could you please help me look at this ? Thanks!!!
This is definitely an unexpected error, looking into this now
I was able to recreate this using your exact setup, particularly using deepsparse==1.6.1. We've had a lot of changes for 1.7 which is releasing soon, so I was able to actually run fine on our nightly build (or from source).
To set that up, make sure to uninstall sparseml, sparsezoo, and deepsparse, then do pip install sparseml-nightly[transformers] deepsparse-nightly
or install both from source, not just sparseml. With that environment, I was able to export and run fine
Hi @luuyin A heads up that 1.7 recently went out. We hope this can address the issue you faced. Thank you! Jeannie / Neural Magic