transformers.js The scripts/convert.py script fails for a few reasons

System Info

Python 3.12.3
Mac OSX M3

Environment/Platform

[ ] Website/web-app
[ ] Browser extension
[X] Server-side (e.g., Node.js, Deno, Bun)
[ ] Desktop app (e.g., Electron)
[X] Other (e.g., VSCode extension)

Description

I am trying to run the model locally, as it doesn't appear to work running a remote model in Node.js.

First I followed https://github.com/xenova/transformers.js/blob/main/scripts/convert.py (which is linked in the README):

$ python3 -m pip install -r requirements.txt
Collecting transformers==4.33.2 (from transformers[torch]==4.33.2->-r requirements.txt (line 1))
  Using cached transformers-4.33.2-py3-none-any.whl.metadata (119 kB)
ERROR: Could not find a version that satisfies the requirement onnxruntime<1.16.0 (from versions: 1.17.0, 1.17.1, 1.17.3, 1.18.0, 1.18.1)
ERROR: No matching distribution found for onnxruntime<1.16.0

So that onnxruntime<1.16.0 does not seem to exist in pip.

Can you update that script?

Second, I tried just installing the latest versions of everything instead, by making this the requirements.txt:

transformers
onnxruntime
optimum
tqdm
onnx

But after I ran this:

$ python3 -m pip install -r requirements.txt
... successful installation stuff...
$ python3 -m convert --quantize --task summarization --model_id bart-large-cnn

I got an error:

TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'

Full stack trace:

Framework not specified. Using pt to export the model.
The task `text2text-generation` was manually specified, and past key values will not be reused in the decoding. if needed, please pass `--task text2text-generation-with-past` to export using the past key values.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 142, 'min_length': 56, 'early_stopping': True, 'num_beams': 4, 'length_penalty': 2.0, 'no_repeat_ngram_size': 3, 'forced_bos_token_id': 0, 'forced_eos_token_id': 2}

***** Exporting submodel 1/2: BartEncoder *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
	- use_cache -> False
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:247: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:254: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:286: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):

***** Exporting submodel 2/2: BartForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
	- use_cache -> False
./venv/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1 or self.sliding_window is not None:
./venv/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.

Validating ONNX model models/bart-large-cnn/encoder_model.onnx...
	-[✓] ONNX model output names match reference model (last_hidden_state)
	- Validating ONNX Model output "last_hidden_state":
		-[✓] (2, 16, 1024) matches (2, 16, 1024)
		-[✓] all values close (atol: 1e-05)

Validating ONNX model models/bart-large-cnn/decoder_model.onnx...
	-[✓] ONNX model output names match reference model (logits)
	- Validating ONNX Model output "logits":
		-[✓] (2, 16, 50264) matches (2, 16, 50264)
		-[x] values not close enough, max diff: 7.82012939453125e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 7.82012939453125e-05.
 The exported model was saved at: models/bart-large-cnn
Quantizing:   0%|                                                                        | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "./convert.py", line 545, in <module>
    main()
  File "./convert.py", line 521, in main
    quantize([
  File "./convert.py", line 294, in quantize
    quantize_dynamic(
TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'

It only seems to have output these files:

So then when I run my Node.js script (full script code at the bottom of the question in the SO link above), I get:

Error: local_files_only=true or env.allowRemoteModels=false and file was not found locally at "./import/language/tibetan/models/bart-large-cnn/onnx/encoder_model_quantized.onnx".

How do I get this working?

Reproduction

As described above.

Try and install requirements.txt from the convert.py script linked in the README. It fails.
Try and install the latest pip packages, and run convert script. It also fails.

Jul 17 '24 04:07 lancejpollard

I actually resolved the issue by updating Optimum to the latest version and keeping all other packages in requirements.txt the same.

pip install -r requirements.txt
pip install --upgrade optimum

Jul 24 '24 12:07 mayank1khurana

i also get the error

TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'

the optimize_model argument was removed in https://github.com/microsoft/onnxruntime/pull/16422 (merged june 21 2023).

(i am using onnxruntime version 1.18.1, the current latest version.)

Aug 06 '24 12:08 kaczmarj

I just tried v3 branch, and upgraded onnxruntime to 1.18.1. It seems I have no problem with command "python -m scripts.convert --quantize --model_id bert-base-uncased" on Windows.

Aug 12 '24 08:08 gyagp