The scripts/convert.py script fails for a few reasons
System Info
- Python 3.12.3
- Mac OSX M3
Environment/Platform
- [ ] Website/web-app
- [ ] Browser extension
- [X] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [X] Other (e.g., VSCode extension)
Description
I am trying to run the model locally, as it doesn't appear to work running a remote model in Node.js.
First I followed https://github.com/xenova/transformers.js/blob/main/scripts/convert.py (which is linked in the README):
$ python3 -m pip install -r requirements.txt
Collecting transformers==4.33.2 (from transformers[torch]==4.33.2->-r requirements.txt (line 1))
Using cached transformers-4.33.2-py3-none-any.whl.metadata (119 kB)
ERROR: Could not find a version that satisfies the requirement onnxruntime<1.16.0 (from versions: 1.17.0, 1.17.1, 1.17.3, 1.18.0, 1.18.1)
ERROR: No matching distribution found for onnxruntime<1.16.0
So that onnxruntime<1.16.0 does not seem to exist in pip.
Can you update that script?
Second, I tried just installing the latest versions of everything instead, by making this the requirements.txt:
transformers
onnxruntime
optimum
tqdm
onnx
But after I ran this:
$ python3 -m pip install -r requirements.txt
... successful installation stuff...
$ python3 -m convert --quantize --task summarization --model_id bart-large-cnn
I got an error:
TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'
Full stack trace:
Framework not specified. Using pt to export the model.
The task `text2text-generation` was manually specified, and past key values will not be reused in the decoding. if needed, please pass `--task text2text-generation-with-past` to export using the past key values.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 142, 'min_length': 56, 'early_stopping': True, 'num_beams': 4, 'length_penalty': 2.0, 'no_repeat_ngram_size': 3, 'forced_bos_token_id': 0, 'forced_eos_token_id': 2}
***** Exporting submodel 1/2: BartEncoder *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
- use_cache -> False
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:247: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:254: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:286: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
***** Exporting submodel 2/2: BartForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
- use_cache -> False
./venv/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1 or self.sliding_window is not None:
./venv/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating ONNX model models/bart-large-cnn/encoder_model.onnx...
-[✓] ONNX model output names match reference model (last_hidden_state)
- Validating ONNX Model output "last_hidden_state":
-[✓] (2, 16, 1024) matches (2, 16, 1024)
-[✓] all values close (atol: 1e-05)
Validating ONNX model models/bart-large-cnn/decoder_model.onnx...
-[✓] ONNX model output names match reference model (logits)
- Validating ONNX Model output "logits":
-[✓] (2, 16, 50264) matches (2, 16, 50264)
-[x] values not close enough, max diff: 7.82012939453125e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 7.82012939453125e-05.
The exported model was saved at: models/bart-large-cnn
Quantizing: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "./convert.py", line 545, in <module>
main()
File "./convert.py", line 521, in main
quantize([
File "./convert.py", line 294, in quantize
quantize_dynamic(
TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'
It only seems to have output these files:
So then when I run my Node.js script (full script code at the bottom of the question in the SO link above), I get:
Error:
local_files_only=trueorenv.allowRemoteModels=falseand file was not found locally at"./import/language/tibetan/models/bart-large-cnn/onnx/encoder_model_quantized.onnx".
How do I get this working?
Reproduction
As described above.
- Try and install
requirements.txtfrom theconvert.pyscript linked in the README. It fails. - Try and install the latest
pippackages, and runconvertscript. It also fails.
I actually resolved the issue by updating Optimum to the latest version and keeping all other packages in requirements.txt the same.
- pip install -r requirements.txt
- pip install --upgrade optimum
i also get the error
TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'
the optimize_model argument was removed in https://github.com/microsoft/onnxruntime/pull/16422 (merged june 21 2023).
(i am using onnxruntime version 1.18.1, the current latest version.)
I just tried v3 branch, and upgraded onnxruntime to 1.18.1. It seems I have no problem with command "python -m scripts.convert --quantize --model_id bert-base-uncased" on Windows.