XNNPACK icon indicating copy to clipboard operation
XNNPACK copied to clipboard

vrnd kernels for rvv are incorrect

Open dsharlet opened this issue 9 months ago • 3 comments

The vrnd kernels for rvv work by converting a float to an int and back. This doesn't work in a few cases:

  • For floats that are outside the range of an int32
  • It loses infinity/NaN

I tried to partially fix the latter in issue in #7971.

@KaustubhIMG @ken-unger

dsharlet avatar Mar 20 '25 17:03 dsharlet

Thanks for the note.

re: // TODO: We should also preserve NaN

FWIW this is something that Eigen also needs to do, and there is an ongoing PR where min/max with NaN propagation could perhaps be used for reference.
https://gitlab.com/libeigen/eigen/-/blob/8ac76af2902bedd9a075c32a67ec9f55d2e6b325/Eigen/src/Core/arch/RVV10/PacketMath.h#L1223

ken-unger avatar Mar 21 '25 05:03 ken-unger

I see this same behavior (cast to int) in sse2, 0armv7, wasmsimd, and hexagon. Also hexagon qfloats do not have a formal NaN or inf. You can find implementations for arm and intel in wasmsimd. The basic idea is compare if the exp is large enough that the value is already an integer. (exp >= 23?) This includes exp = 255 for NaN and inf. Meanwhile proceed with casting to int (with rounding modes for ties). The basic one is away from zero for ties. 2.5 rounds to 3.0 -2.5 rounds to -3.0 3.5 rounds to 4.0 -3.5 rounds to -4.0 And convert back to float Then use the comparison of exponent to select the original float or the float that was convert to/from int. Most cpus have a way to select between 2 vectors using a mask. On ARM Neon bitselect On SSE4 vblend_ps

See Also |https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc#explicit-fp-rounding-mode-intrinsics

fbarchard avatar Mar 24 '25 22:03 fbarchard

For Hexagon I implemented f32-vrndne using the Neon method, which is roundf() style. Its apparently not the intended rounding.. it should be nearbyintf(), but the method would be similar for RVV.

  1. Compare to 8 million and generate a predicate, representing values that need rounding.
  2. Cast to int and back to float with applicable rounding.
  3. Use predicate to keep the original value if it is more than 8 million (already an integer)

The simplier, more efficient way, depends on IEEE math implementing 'even' rounding. Simply add 8 million and subtract it. vrndne could use that. vrndu, vrndd, vrndz could use the neon/hexagon method.

fbarchard avatar May 12 '25 21:05 fbarchard