brevitas icon indicating copy to clipboard operation
brevitas copied to clipboard

Error when exporting ONNX for quantized OPT-125M model

Open RealJustinNi opened this issue 11 months ago • 4 comments

While quantizing the OPT125M model and exporting it to ONNX, I encountered the following issue:

Traceback (most recent call last):
  File "/home/zhaojun/2025proj/brevitas/src/brevitas_examples/llm/main.py", line 518, in <module>
    main(args)
  File "/home/zhaojun/2025proj/brevitas/src/brevitas_examples/llm/main.py", line 322, in main
    model_export(model, calibration_loader[0], args)
  File "/home/zhaojun/2025proj/brevitas/src/brevitas_examples/llm/main.py", line 69, in model_export
    onnx_export_from_model(
  File "/home/zhaojun/anaconda3/envs/brevitas/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 1176, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/home/zhaojun/anaconda3/envs/brevitas/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 762, in export_models
    export(
  File "/home/zhaojun/anaconda3/envs/brevitas/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 866, in export
    export_output = export_pytorch(
  File "/home/zhaojun/anaconda3/envs/brevitas/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 550, in export_pytorch
    check_dummy_inputs_are_allowed(model, dummy_inputs)
  File "/home/zhaojun/anaconda3/envs/brevitas/lib/python3.10/site-packages/optimum/exporters/utils.py", line 704, in check_dummy_inputs_are_allowed
    raise ValueError(
ValueError: Config dummy inputs are not a subset of the model inputs: {'input_ids', 'attention_mask', 'past_key_values', 'position_ids'} vs {'input_ids', 'attention_mask', 'past_key_values'}

To Reproduce

My script is as follows:

python main.py --model "models--facebook--opt-125m" \
--seed 42 \
--dataset wikitext2 \
--weight-bit-width 8 \
--weight-param-method stats \
--weight-scale-precision po2_scale \
--weight-quant-type sym \
--weight-quant-format int \
--weight-quant-granularity per_channel \
--quantize-last-layer \
--act-calibration \
--ln-affine-merge \
--bias-corr \
--export-target onnx_qcdq \
--checkpoint-name opt125m_int8.pt \
--export-prefix opt125m_a8w8 \
--seqlen 512 \
--fuse-sequences \
--eval \
#--gptq \

During the process, I faced a problem where the model library name could not be inferred, so I forcibly modified it to "transformers".

If known:

  • Brevitas version: 0.11.0
  • PyTorch version: 2.5.1+cu118

RealJustinNi avatar Feb 14 '25 08:02 RealJustinNi

The LLM entrypoint you used is not meant to work out of the box for all transformer models, which are much more broad in terms of functionality and features and thus difficult to support with one single script.

I will try to have a look into this but the error doesn't even seem to be coming from Brevitas, rather something is happening in the optimum onnx exporter that we leverage.

Could I ask you to try to set-up a simple script where you try to use the same optimum onnx export function but with a fully floating point model?

Giuseppe5 avatar Feb 14 '25 09:02 Giuseppe5

Hello, I modified the code in brevitas/src/brevitas_examples/llm/main.py. In front of the quantization code, I added the following function to directly export the ONNX floating-point model.

print('Exporting FP32 Model...')
    with torch.no_grad():
        onnx_export_from_model(
            model,
            f"./{args.export_prefix}",
            task="text-generation-with-past",
            do_validation=False,
            library_name = "transformers")

And this step was executed successfully. I can also visualize the ONNX model and see the entire graph.

RealJustinNi avatar Feb 16 '25 09:02 RealJustinNi

When I changed --weight-quant-granularity to per_group, the ONNX export still had issues, and the following error occurred: RuntimeError: Module <class 'brevitas.proxy.groupwise_int_parameter_quant.GroupwiseWeightQuantProxyFromInjector'> not supported for export.. I suspect the problem lies in the _set_export_handler function in brevitas/export/manager.py, because the handler returned None.

RealJustinNi avatar Feb 16 '25 09:02 RealJustinNi

Hello, I modified the code in brevitas/src/brevitas_examples/llm/main.py. In front of the quantization code, I added the following function to directly export the ONNX floating-point model.

Thanks for checking that export works in floating point, that's already a good starting point. I will try to make some time to look into this but I can't guarantee when, unfortunately.

When I changed --weight-quant-granularity to per_group

The reason for this is that the latest ONNX version supported for export from torch doesn't have per_group quantization support. We have some workarounds on how to implement this if needed, but it means adding reshape operators that might be annoying to handle downstream.

If this is interesting/important for you, I could give you some pointers on how to add custom export support. In the meantime, I'll discuss internally if we can have a temporary solution while we wait for torch to add compatibility.

Giuseppe5 avatar Feb 18 '25 07:02 Giuseppe5