onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Fix Int4x2/UInt4x2 to MLFloat16 casting compilation errors on ARM64 Linux

Open Copilot opened this issue 8 months ago • 0 comments

Problem

The build was failing on ARM64 Linux with compilation errors when trying to cast from Int4x2/UInt4x2 to Eigen::half:

/onnxruntime_src/build/Debug/Debug/vcpkg_installed/arm64-linux/include/Eigen/src/Core/MathFunctions.h:369:74: error: no matching function for call to 'onnxruntime::Int4x2Base<false>::Int4x2Base(const Eigen::half&)'
   369 |   EIGEN_DEVICE_FUNC static inline NewType run(const OldType& x) { return static_cast<NewType>(x); }
       |                                                                          ^~~~~~~~~~~~~~~~~~~~~~~

This error occurred specifically on ARM64 Linux builds but not on Windows x64.

Root Cause

The issue occurred because:

  1. The generic TensorCaster template uses Eigen's casting: out_vector = in_vector.template cast<DstEigenCastType>();
  2. EigenCastType<MLFloat16>::type maps to Eigen::half
  3. Eigen doesn't know how to convert between Int4x2/UInt4x2 and Eigen::half
  4. The existing specialized casting paths work on Windows x64 due to platform-specific optimizations, but fail on ARM64 Linux

Solution

Added explicit TensorCaster specializations that handle these conversions directly without going through Eigen:

  • TensorCaster<Int4x2, MLFloat16> - Converts Int4x2 to MLFloat16 by unpacking 4-bit values and converting to float
  • TensorCaster<UInt4x2, MLFloat16> - Converts UInt4x2 to MLFloat16 by unpacking 4-bit values and converting to float
  • TensorCaster<MLFloat16, Int4x2> - Converts MLFloat16 to Int4x2 with proper clamping to signed int4 range [-8, 7]
  • TensorCaster<MLFloat16, UInt4x2> - Converts MLFloat16 to UInt4x2 with proper clamping to unsigned int4 range [0, 15]

The specializations include:

  • Proper value clamping using ToInt4ElementConverter helper functions
  • Correct element packing/unpacking logic respecting Int4x2's nibble layout (elem 0 = low nibble, elem 1 = high nibble)
  • Handling of odd-sized tensors with appropriate padding

Testing

  • ✅ Manual compilation test confirms the fix resolves compilation issues
  • ✅ Unit tests verify correct clamping logic and template specialization resolution
  • ✅ No existing functionality is affected (0 lines deleted, 97 lines added)

This fix prevents the generic Eigen-based casting from being attempted for these specific type combinations while maintaining compatibility with existing optimized paths on all platforms.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Jun 12 '25 16:06 Copilot