optimum Issue when converting Exaone 3.0 7.8B model

System Info

optimum==1.24.0
Python 3.12.4

Who can help?

Hi,

When trying to convert this model to onnx format: https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

With this code:

from transformers import AutoModelForCausalLM, AutoConfig
from optimum.exporters.onnx import onnx_export_from_model

from optimum.exporters.onnx.config import TextDecoderWithPositionIdsOnnxConfig
from optimum.utils import NormalizedTextConfig

class ExaoneOnnxConfig(TextDecoderWithPositionIdsOnnxConfig):
        DEFAULT_ONNX_OPSET = 19
        NORMALIZED_CONFIG_CLASS = NormalizedTextConfig

model_id = "C:\\huggingface\\EXAONE-3.0-7.8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
# At this point, we could override some submodules, forward methods, weights, etc. from the model.

onnx_config = ExaoneOnnxConfig(
    config=config,
    task="text-generation",
    use_past=True,
    use_past_in_inputs=True,

)

custom_onnx_configs = {
    "model": onnx_config
}


onnx_export_from_model(model,custom_onnx_configs=custom_onnx_configs, output="ex_onnx/", task="text-generation")

I'm getting this issue:

File "C:\Users\amd\.cache\huggingface\modules\transformers_modules\EXAONE-3.0-7.8B-Instruct\modeling_exaone.py", line 850, in forward
    key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\amd\miniconda3\envs\exaone\Lib\site-packages\transformers\cache_utils.py", line 449, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 8 for tensor number 1 in the list.

I think the issue is with difference in num_attention_heads and num_key_value_heads

"num_attention_heads": 32, "num_key_value_heads": 8,

Is there a way to configure the export?

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Can be reproduced with publicly available model

Expected behavior

Without using use_past_in_inputs=True the model is exported as normal.

Feb 27 '25 22:02 Zhaeong

Issue seems to be this line: https://github.com/huggingface/optimum/blob/main/optimum/utils/input_generators.py#L657

It uses self.num_attention_heads, instead of num_key_value_heads

Feb 28 '25 18:02 Zhaeong

It seems like Exaone and Llama share the same input/output pattern

https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/models/llama/modeling_llama.py#L787
https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct/blob/main/modeling_exaone.py#L1013

I was able to export Exaone by passing LlamaOnnxConfig

# transformers==4.47.1
# optimum==1.24.0
import transformers
import optimum.exporters

model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
)
custom_onnx_configs = {
    "model": optimum.exporters.onnx.model_configs.LlamaOnnxConfig(
        config=model.config,
        task="text-generation",
    )
}
optimum.exporters.onnx.onnx_export_from_model(
    model=model,
    task="text-generation",
    output="./hidad",
    opset=17,
    custom_onnx_configs=custom_onnx_configs,
)

Output Log

Found different candidate ONNX initializers (likely duplicate) for the tied weights:
        lm_head.weight: {'onnx::MatMul_11641'}
        transformer.wte.weight: {'transformer.wte.weight'}
                -[x] values not close enough, max diff: 2.956390380859375e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 2.956390380859375e-05.
 The exported model was saved at: hidad

Maybe you were missing DUMMY_INPUT_GENERATOR_CLASSES and DUMMY_PKV_GENERATOR_CLASS? https://github.com/huggingface/optimum/blob/c2259ea78691f789fb9e1849ce27a13117609b0d/optimum/exporters/onnx/model_configs.py#L347-L352

Mar 13 '25 04:03 jl749

Hi @jl749,

Thanks for the response, that seems to work with exporting, however when I try to use onnxruntime-genai to do model inferencing I'm running into this error:

genai\examples\csharp\HelloPhi\bin\x64\Debug_DirectML\net6.0\runtimes\win-x64\native\2025-03-16 22:40:36.9555206 [E:onnxruntime:onnxruntime-genai, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running DmlFusedNode_6_81 node. Name:'DmlFusedNode_6_81' Status Message: onnxruntime\core\framework\execution_frame.cc:173 onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,8,500,80} Requested shape:{1,8,518,80}
Stacktrace:
onnxruntime\onnxruntime\core\framework\op_kernel.cc(82): onnxruntime!onnxruntime::OpKernelContext::OutputMLValue+0x117

Any clues for finding the cause?

Mar 17 '25 06:03 Zhaeong