[CPU EP] GatherND crashes with division by zero when batch dimensions mismatch between input and indices
Issue description
Passing incompatible batch dimensions between the input and indices tensors (2 vs 3 in this example) should fail rather than crash.
{
"op_type": "GatherND",
"version": 12,
"batch_dims": 1,
"data": [[0,1,2],[10,11,12],[20,21,22]],
"indices": [[1],[2]],
"output": [1,7],
"T": "float32",
}
Expected: Status failure
Actual: Fatal division by zero.
Note passing 2 input batch dimensions work (as they both match), and passing 1 input batch dimension works too (ORT appears to either broadcast or clamp the input).
Stack:
> onnxruntime.dll!onnxruntime::GatherNDBase::PrepareForCompute::__l2::<lambda>(__int64 slice_idx) Line 85 C++
onnxruntime.dll!onnxruntime::GatherNDBase::PrepareForCompute::__l2::<lambda>(__int64 first, __int64 last) Line 111 C++
onnxruntime.dll!std::invoke<void <lambda>(__int64, __int64) &,__int64,__int64>(onnxruntime::GatherNDBase::PrepareForCompute::__l2::void <lambda>(__int64, __int64) & _Obj, __int64 && _Arg1, __int64 && <_Args2_0>) Line 1601 C++
onnxruntime.dll!std::_Invoker_ret<void>::_Call<void <lambda>(__int64, __int64) &,__int64,__int64>(onnxruntime::GatherNDBase::PrepareForCompute::__l2::void <lambda>(__int64, __int64) & _Func, __int64 && <_Vals_0>, __int64 && <_Vals_1>) Line 661 C++
onnxruntime.dll!std::_Func_impl_no_alloc<void <lambda>(__int64, __int64),void,__int64,__int64>::_Do_call(__int64 && <_Args_0>, __int64 && <_Args_1>) Line 821 C++
onnxruntime.dll!std::_Func_class<void,__int64,__int64>::operator()(__int64 <_Args_0>, __int64 <_Args_1>) Line 862 C++
onnxruntime.dll!onnxruntime::concurrency::ThreadPool::ParallelFor(__int64 n, const onnxruntime::TensorOpCost & c, const std::function<void __cdecl(__int64,__int64)> & f) Line 622 C++
onnxruntime.dll!onnxruntime::concurrency::ThreadPool::TryParallelFor(onnxruntime::concurrency::ThreadPool * tp, __int64 total, const onnxruntime::TensorOpCost & cost_per_unit, const std::function<void __cdecl(__int64,__int64)> & fn) Line 704 C++
onnxruntime.dll!onnxruntime::concurrency::ThreadPool::TryParallelFor(onnxruntime::concurrency::ThreadPool * tp, __int64 total, double cost_per_unit, const std::function<void __cdecl(__int64,__int64)> & fn) Line 252 C++
onnxruntime.dll!onnxruntime::GatherNDBase::PrepareForCompute<__int64>(const onnxruntime::TensorShape & input_shape, const onnxruntime::Tensor * indices_tensor, const __int64 bytes_per_value, onnxruntime::GatherNDBase::Prepare & p, onnxruntime::concurrency::ThreadPool * tp) Line 106 C++
onnxruntime.dll!onnxruntime::GatherND::Compute(onnxruntime::OpKernelContext * context) Line 171 C++
To reproduce
onnxruntime_perf_test.exe -I -r 1 -e cpu gatherNdCrash.onnx
Urgency
Not blocking, but should add to ORT fuzzing test cases, as embedding an ONNX model in another document could crash the user process. Can validate untrusted input before passing to ORT backend.
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
e76bd2f5e98dda71b96e93d23ca275ca8a3eec47
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
If I have time ⏳, I'll do the minimum fix myself to at least return a bad Status.