wit-bindgen icon indicating copy to clipboard operation
wit-bindgen copied to clipboard

Recent changes to NaN canonicalization?

Open juntyr opened this issue 2 months ago • 3 comments

I am using WASM components in research to produce reproducible floating-point evaluation results (with a pre-runtime NaN canonicalization pass over the WASM bytecode). In particular, I am compiling the SZ3 (https://github.com/szcompressor/SZ3) scientific compressor, via its Rust bindings, to WASM, exposed in a component (https://github.com/juntyr/numcodecs-rs/tree/main/crates). In June 2025, the compressor, when running on lots of NaN values, was producing a low compression ratio. Now, in October, the compression ratio is higher. The version of SZ3 hasn't changed, the version of Zstandard (which SZ3 uses for lossless compression afterwards) hasn't changed. My hunch is that something regarding NaN canonicalization changed somewhere in my WASM pipeline, which includes

  1. my unchanged NaN canonicalization on the core WASM module bytecode (https://github.com/juntyr/numcodecs-rs/blob/main/crates/numcodecs-wasm-host-reproducible/src/transform/nan.rs)
  2. running in wasmtime
  3. using wasmtime_component_layer to polyfill components with just core WASM modules
  4. components are built using wit-bindgen

@alexcrichton I have tried to go through the changelogs of recent releases but cannot find anything recent (only https://github.com/bytecodealliance/wasmtime/issues/9826 which seems a bit too early). Can you remember anything happening with NaNs in the meantime?

juntyr avatar Oct 23 '25 10:10 juntyr

I would find it very surprising if anything at the component model / WIT bindings level were to change the results here.

Are you compiling the core wasm module with the exact same toolchain, or have there been changes to that? Similarly, are you using the same locked versions of you dependencies, or could there have been changes to any of those?

tschneidereit avatar Oct 23 '25 11:10 tschneidereit

Zstandard and SZ3 are implemented in C(++) and haven't changed in the meantime, so Rust compilation wouldn't affect them. The Rust wrappers just forward values, so only the WASM boundary is left - and NaN canonicalization could affect things here since different NaN bit patterns might be encoded as separate values

juntyr avatar Oct 23 '25 11:10 juntyr

I'm not aware of any changes either, so my best guess would be maybe a different C/C++ toolchain and different optimizations perhaps? If you've got something reproducible you could try bisection as well with wasmtime versions to try to pinpoint any change in behavior too

alexcrichton avatar Oct 23 '25 14:10 alexcrichton