Issue when converting Exaone 3.0 7.8B model
System Info
optimum==1.24.0
Python 3.12.4
Who can help?
Hi,
When trying to convert this model to onnx format: https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
With this code:
from transformers import AutoModelForCausalLM, AutoConfig
from optimum.exporters.onnx import onnx_export_from_model
from optimum.exporters.onnx.config import TextDecoderWithPositionIdsOnnxConfig
from optimum.utils import NormalizedTextConfig
class ExaoneOnnxConfig(TextDecoderWithPositionIdsOnnxConfig):
DEFAULT_ONNX_OPSET = 19
NORMALIZED_CONFIG_CLASS = NormalizedTextConfig
model_id = "C:\\huggingface\\EXAONE-3.0-7.8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
# At this point, we could override some submodules, forward methods, weights, etc. from the model.
onnx_config = ExaoneOnnxConfig(
config=config,
task="text-generation",
use_past=True,
use_past_in_inputs=True,
)
custom_onnx_configs = {
"model": onnx_config
}
onnx_export_from_model(model,custom_onnx_configs=custom_onnx_configs, output="ex_onnx/", task="text-generation")
I'm getting this issue:
File "C:\Users\amd\.cache\huggingface\modules\transformers_modules\EXAONE-3.0-7.8B-Instruct\modeling_exaone.py", line 850, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\amd\miniconda3\envs\exaone\Lib\site-packages\transformers\cache_utils.py", line 449, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 8 for tensor number 1 in the list.
I think the issue is with difference in num_attention_heads and num_key_value_heads
"num_attention_heads": 32, "num_key_value_heads": 8,
Is there a way to configure the export?
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
Can be reproduced with publicly available model
Expected behavior
Without using use_past_in_inputs=True the model is exported as normal.
Issue seems to be this line: https://github.com/huggingface/optimum/blob/main/optimum/utils/input_generators.py#L657
It uses self.num_attention_heads, instead of num_key_value_heads
It seems like Exaone and Llama share the same input/output pattern
- https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/models/llama/modeling_llama.py#L787
- https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct/blob/main/modeling_exaone.py#L1013
I was able to export Exaone by passing LlamaOnnxConfig
# transformers==4.47.1
# optimum==1.24.0
import transformers
import optimum.exporters
model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"
model = transformers.AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
)
custom_onnx_configs = {
"model": optimum.exporters.onnx.model_configs.LlamaOnnxConfig(
config=model.config,
task="text-generation",
)
}
optimum.exporters.onnx.onnx_export_from_model(
model=model,
task="text-generation",
output="./hidad",
opset=17,
custom_onnx_configs=custom_onnx_configs,
)
Output Log
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
lm_head.weight: {'onnx::MatMul_11641'}
transformer.wte.weight: {'transformer.wte.weight'}
-[x] values not close enough, max diff: 2.956390380859375e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 2.956390380859375e-05.
The exported model was saved at: hidad
Maybe you were missing DUMMY_INPUT_GENERATOR_CLASSES and DUMMY_PKV_GENERATOR_CLASS?
https://github.com/huggingface/optimum/blob/c2259ea78691f789fb9e1849ce27a13117609b0d/optimum/exporters/onnx/model_configs.py#L347-L352
Hi @jl749,
Thanks for the response, that seems to work with exporting, however when I try to use onnxruntime-genai to do model inferencing I'm running into this error:
genai\examples\csharp\HelloPhi\bin\x64\Debug_DirectML\net6.0\runtimes\win-x64\native\2025-03-16 22:40:36.9555206 [E:onnxruntime:onnxruntime-genai, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running DmlFusedNode_6_81 node. Name:'DmlFusedNode_6_81' Status Message: onnxruntime\core\framework\execution_frame.cc:173 onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,8,500,80} Requested shape:{1,8,518,80}
Stacktrace:
onnxruntime\onnxruntime\core\framework\op_kernel.cc(82): onnxruntime!onnxruntime::OpKernelContext::OutputMLValue+0x117
Any clues for finding the cause?