OnnxStream Exporting TinyLlama to onnx

Trying to replicate the exporting tinyllama using the https://github.com/vitoplantamura/OnnxStream/blob/master/assets/LLM.md code but unable to. What version of the libraries were used?

Jul 05 '25 21:07 usagi87

hi,

when you run the script, what error do you get?

Vito

Jul 06 '25 07:07 vitoplantamura

when using the up to date versions of the libraries used in exporting:

ValueError: The past_key_values should be either a Cache object or None.

when using the version mentioned in the model config.json transformers=4.34.0 the export works but gives these kinds of errors:

/home/env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:392: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim): In-place op on output of tensor.shape. See https://pytorch.org/docs/main/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode In-place op on output of tensor.shape. See https://pytorch.org/docs/main/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode

Using onnx2txt works but when trying it usually get errors when Loading weights: unexpected shape of output wrong data type of input (not implemented)

and when trying to use onnxsim always get: onnx.onnx_cpp2py_export.checker.ValidationError: model with IR version >= 3 must specify opset_import for ONNX

Jul 06 '25 12:07 usagi87

mmh, I tried to export the onnx file with transformers==4.46.3 and it gives this error: please report a bug to PyTorch. We don't have an op for aten::view

In these cases I create a virtual environment and install the relevant packages at a certain date, which in this case is October 1, 2023. I take the version of the packages from pypi.org. For example the version of transformers at this date is 4.34.0 (as in your last test). Make sure to install these packages available at October 1, 2023: transformers, torch, onnx.

Some notes:

I would use the old versions of these packages only to export the onnx file. For onnxsim_large_model (step 2/3) and onnx2txt (step 3/3) you can use the latest versions: I see no reason why they shouldn't work.
Use onnxsim_large_model instead of onnxsim directly.
The TracerWarning you mentioned can be ignored.
Make sure to download and load the model in fp32 precision, as specified here: https://github.com/vitoplantamura/OnnxStream/issues/89#issuecomment-2508806558

Let me know,

Vito

Jul 06 '25 19:07 vitoplantamura

I was able to successfully able to export tinyllama. But it only worked when using onnxsim_large_model before using onnx2txt other ways it wouldn't work. Also llm worked only when using --no-fp16.

Jul 07 '25 03:07 usagi87

yes, onnxsim_large_model must be used before converting from onnx to the OnnxStream compatible file format.

Regarding --no-fp16, your architecture apparently does not support FP16 arithmetic/conversion instructions.

Vito

Jul 07 '25 06:07 vitoplantamura