Fix Int4x2/UInt4x2 to MLFloat16 casting compilation errors on ARM64 Linux
Problem
The build was failing on ARM64 Linux with compilation errors when trying to cast from Int4x2/UInt4x2 to Eigen::half:
/onnxruntime_src/build/Debug/Debug/vcpkg_installed/arm64-linux/include/Eigen/src/Core/MathFunctions.h:369:74: error: no matching function for call to 'onnxruntime::Int4x2Base<false>::Int4x2Base(const Eigen::half&)'
369 | EIGEN_DEVICE_FUNC static inline NewType run(const OldType& x) { return static_cast<NewType>(x); }
| ^~~~~~~~~~~~~~~~~~~~~~~
This error occurred specifically on ARM64 Linux builds but not on Windows x64.
Root Cause
The issue occurred because:
- The generic
TensorCastertemplate uses Eigen's casting:out_vector = in_vector.template cast<DstEigenCastType>(); -
EigenCastType<MLFloat16>::typemaps toEigen::half - Eigen doesn't know how to convert between
Int4x2/UInt4x2andEigen::half - The existing specialized casting paths work on Windows x64 due to platform-specific optimizations, but fail on ARM64 Linux
Solution
Added explicit TensorCaster specializations that handle these conversions directly without going through Eigen:
-
TensorCaster<Int4x2, MLFloat16>- Converts Int4x2 to MLFloat16 by unpacking 4-bit values and converting to float -
TensorCaster<UInt4x2, MLFloat16>- Converts UInt4x2 to MLFloat16 by unpacking 4-bit values and converting to float -
TensorCaster<MLFloat16, Int4x2>- Converts MLFloat16 to Int4x2 with proper clamping to signed int4 range [-8, 7] -
TensorCaster<MLFloat16, UInt4x2>- Converts MLFloat16 to UInt4x2 with proper clamping to unsigned int4 range [0, 15]
The specializations include:
- Proper value clamping using
ToInt4ElementConverterhelper functions - Correct element packing/unpacking logic respecting Int4x2's nibble layout (elem 0 = low nibble, elem 1 = high nibble)
- Handling of odd-sized tensors with appropriate padding
Testing
- ✅ Manual compilation test confirms the fix resolves compilation issues
- ✅ Unit tests verify correct clamping logic and template specialization resolution
- ✅ No existing functionality is affected (0 lines deleted, 97 lines added)
This fix prevents the generic Eigen-based casting from being attempted for these specific type combinations while maintaining compatibility with existing optimized paths on all platforms.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.