onnxruntime
onnxruntime copied to clipboard
[WIP] `AddCastNode` in `insert_cast_transformer.cc` sets invalid dtype `-1`
Analyzing the issue where AddCastNode in insert_cast_transformer.cc sets invalid dtype -1.
Initial investigation plan:
- [x] Examined the AddCastNode function in insert_cast_transformer.cc
- [x] Identified the issue is with the
to_typeparameter being set to -1 - [x] Verified that TensorProto_DataType_FLOAT (1) and TensorProto_DataType_FLOAT16 (10) are valid enum values
- [x] Located the call sites in ApplyImpl function that call AddCastNode
- [ ] Create a minimal reproduction test
- [ ] Identify the root cause of the -1 value
- [ ] Implement the fix
- [ ] Run tests to verify the fix
The error occurs when processing fp16 models with CPU execution provider, specifically with RandomNormalLike operations. The Cast node creation fails due to an invalid to attribute value of -1.
Next steps: Create a focused test and identify where the -1 value originates.
Fixes #16449.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
@copilot try again
@copilot try again
I've analyzed the issue more deeply. After reverting my validation approach, I'm investigating the root cause where the -1 value originates. The issue appears to be specific to RandomNormalLike operations with fp16 models on CPU. I found a test model that reproduces the problem and am now tracing through the code paths to identify where the invalid value is being generated rather than just validating against it.