Fix CUDA GatherND batch dimension validation regression
Fixes a regression where GatherND operations would fail with CUDAExecutionProvider but work correctly with CPUExecutionProvider, causing the error:
gather_nd.cc:30 CheckBatchDimensionsMatch Batch dimensions differ at index 0: 1 != 3, tensor indices: 0, 1
Root Cause
The CUDA implementation had an additional CheckBatchDimensionsMatch validation that enforced strict matching of batch dimensions between input and indices tensors. This validation was not present in the CPU implementation, creating inconsistent behavior between execution providers.
Solution
Removed the overly restrictive batch dimension validation from the CUDA implementation to align with CPU behavior. The CPU implementation has been working correctly without this validation, demonstrating that it's safe to remove.
Changes
-
onnxruntime/core/providers/cuda/tensor/gather_nd.cc: Removed
CheckBatchDimensionsMatchcall that was causing the regression -
onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc: Added regression test
GatherND_flexible_input_shapes_regressionto prevent this issue from recurring
Testing
The added test case validates that GatherND works correctly with flexible input shapes when using the default batch_dims=0, ensuring this regression doesn't happen again.
Fixes #25053.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.