crc32c icon indicating copy to clipboard operation
crc32c copied to clipboard

fix incorrect results on big endian targets

Open nissaofthesea opened this issue 2 years ago • 2 comments

fixes #34.

https://github.com/zowens/crc32c/blob/47c2999907a770daec5444cf9fcb522926d8cfe4/src/util.rs#L33

this line in util::split transmutes a &[u8] to a &[u64] without considering endianness, causing incorrect results on BE targets. to fix this, i have it return a slice of the newtype U64Le, whose elements can be converted to u64 correctly, accounting for platforms. on little endian targets this is a no-op, and the conversion method U64Le::get is always inlined, so there shouldn't be any performance hit on those targets.

TODO

  • [X] ~benchmark on LE and BE. i expect to see no change.~
  • [X] ~add more comments so its clear why this struct exists without digging through commit logs~

nissaofthesea avatar Oct 23 '22 00:10 nissaofthesea

benchmarking on my machine (x86_64 / LE) there's no significant changes, which is expected.

i used a noise threshold of 0.025 because differences between benchmarks on the same commit were at most a little under 2.5%.

$ cargo bench --bench rand -- --baseline master --noise-threshold 0.025
Benchmarking crc32_update_megabytes/crc32_update_megabytes: Collecting 100 samples in estimated 5.1196 s crc32_update_megabytes/crc32_update_megabytes
                        time:   [144.00 µs 144.16 µs 144.33 µs]
                        thrpt:  [6.4527 GiB/s 6.4604 GiB/s 6.4674 GiB/s]
                 change:
                        time:   [-1.3616% -1.1414% -0.9058%] (p = 0.00 < 0.05)
                        thrpt:  [+0.9141% +1.1546% +1.3804%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

crc32c_8kb/crc32c_8kb   time:   [1.2739 µs 1.2753 µs 1.2768 µs]
                        thrpt:  [5.9753 GiB/s 5.9825 GiB/s 5.9892 GiB/s]
                 change:
                        time:   [-1.0430% -0.8813% -0.7246%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7299% +0.8891% +1.0540%]
                        Change within noise threshold.

Benchmarking crc32c_append_4kb/crc32c_append_4kb: Collecting 100 samples in estimated 5.0017 s (7.6M itercrc32c_append_4kb/crc32c_append_4kb
                        time:   [655.65 ns 656.51 ns 657.31 ns]
                        change: [+2.0735% +2.2570% +2.4860%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

Benchmarking crc32c_combine_4kb/crc32c_combine_4kb: Collecting 100 samples in estimated 5.0200 s (348k itcrc32c_combine_4kb/crc32c_combine_4kb
                        time:   [13.967 µs 14.046 µs 14.130 µs]
                        change: [+1.0771% +1.6551% +2.2093%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking crc32c_combine_megabyte/crc32c_combine_megabyte: Collecting 100 samples in estimated 5.1351 crc32c_combine_megabyte/crc32c_combine_megabyte
                        time:   [39.069 µs 39.110 µs 39.155 µs]
                        change: [-1.5957% -1.4196% -1.2390%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

Benchmarking crc32c_append_megabyte/crc32c_append_megabyte: Collecting 100 samples in estimated 5.1265 s crc32c_append_megabyte/crc32c_append_megabyte
                        time:   [144.83 µs 145.04 µs 145.27 µs]
                        change: [-1.3535% -0.9236% -0.5653%] (p = 0.00 < 0.05)
                        Change within noise threshold.

nissaofthesea avatar Nov 05 '22 18:11 nissaofthesea

benchmarking on powerpc BE with cross produces odd results: performance improves with the fix. i don't know enough to understand why this is the case (probably is related to however podman, in my case, simulates or emulates other architectures). i'll try this in a vm.

$ cross bench --target powerpc-unknown-linux-gnu --bench rand -- --baseline master --noise-threshold 0.025
Benchmarking crc32_update_megabytes/crc32_update_megabytes: Collecting 100 samples in estimated 9.5312 s crc32_update_megabytes/crc32_update_megabytes
                        time:   [1.0317 ms 1.0390 ms 1.0450 ms]
                        thrpt:  [912.65 MiB/s 917.85 MiB/s 924.38 MiB/s]
                 change:
                        time:   [-29.906% -28.977% -28.128%] (p = 0.00 < 0.05)
                        thrpt:  [+39.137% +40.799% +42.665%]
                        Performance has improved.

crc32c_8kb/crc32c_8kb   time:   [7.3929 µs 7.4035 µs 7.4150 µs]
                        thrpt:  [1.0289 GiB/s 1.0305 GiB/s 1.0320 GiB/s]
                 change:
                        time:   [-35.376% -35.269% -35.155%] (p = 0.00 < 0.05)
                        thrpt:  [+54.213% +54.485% +54.742%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking crc32c_append_4kb/crc32c_append_4kb: Collecting 100 samples in estimated 5.0021 s (1.3M itercrc32c_append_4kb/crc32c_append_4kb
                        time:   [3.7204 µs 3.7256 µs 3.7309 µs]
                        change: [-34.453% -34.356% -34.255%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking crc32c_combine_4kb/crc32c_combine_4kb: Collecting 100 samples in estimated 5.2278 s (91k itecrc32c_combine_4kb/crc32c_combine_4kb
                        time:   [57.219 µs 57.307 µs 57.397 µs]
                        change: [-0.9196% -0.7205% -0.5254%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking crc32c_combine_megabyte/crc32c_combine_megabyte: Collecting 100 samples in estimated 5.1209 crc32c_combine_megabyte/crc32c_combine_megabyte
                        time:   [99.908 µs 100.18 µs 100.44 µs]
                        change: [-0.8661% +0.1591% +2.1066%] (p = 0.86 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Benchmarking crc32c_append_megabyte/crc32c_append_megabyte: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
Benchmarking crc32c_append_megabyte/crc32c_append_megabyte: Collecting 100 samples in estimated 5.8660 s crc32c_append_megabyte/crc32c_append_megabyte
                        time:   [1.0277 ms 1.0314 ms 1.0343 ms]
                        change: [-29.787% -28.993% -28.179%] (p = 0.00 < 0.05)
                        Performance has improved.

EDIT: there's no point in benchmarking BE changes because there's no point in using this library on BE before the fix, since it produces incorrect results

nissaofthesea avatar Nov 05 '22 19:11 nissaofthesea

Anything blocking this from being merged?

crisidev avatar Mar 27 '23 11:03 crisidev

@crisidev Apologies, this didn't make it onto my radar. I can give this a review.

zowens avatar Apr 19 '23 18:04 zowens

Absolutely no worries and thanks for looking into this!

crisidev avatar Apr 19 '23 19:04 crisidev