crc32c
crc32c copied to clipboard
fix incorrect results on big endian targets
fixes #34.
https://github.com/zowens/crc32c/blob/47c2999907a770daec5444cf9fcb522926d8cfe4/src/util.rs#L33
this line in util::split
transmutes a &[u8]
to a &[u64]
without considering endianness, causing incorrect results on BE targets.
to fix this, i have it return a slice of the newtype U64Le
, whose elements can be converted to u64
correctly, accounting for platforms.
on little endian targets this is a no-op, and the conversion method U64Le::get
is always inlined, so there shouldn't be any performance hit on those targets.
TODO
- [X] ~benchmark on LE and BE. i expect to see no change.~
- [X] ~add more comments so its clear why this struct exists without digging through commit logs~
benchmarking on my machine (x86_64 / LE) there's no significant changes, which is expected.
i used a noise threshold of 0.025 because differences between benchmarks on the same commit were at most a little under 2.5%.
$ cargo bench --bench rand -- --baseline master --noise-threshold 0.025
Benchmarking crc32_update_megabytes/crc32_update_megabytes: Collecting 100 samples in estimated 5.1196 s crc32_update_megabytes/crc32_update_megabytes
time: [144.00 µs 144.16 µs 144.33 µs]
thrpt: [6.4527 GiB/s 6.4604 GiB/s 6.4674 GiB/s]
change:
time: [-1.3616% -1.1414% -0.9058%] (p = 0.00 < 0.05)
thrpt: [+0.9141% +1.1546% +1.3804%]
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
crc32c_8kb/crc32c_8kb time: [1.2739 µs 1.2753 µs 1.2768 µs]
thrpt: [5.9753 GiB/s 5.9825 GiB/s 5.9892 GiB/s]
change:
time: [-1.0430% -0.8813% -0.7246%] (p = 0.00 < 0.05)
thrpt: [+0.7299% +0.8891% +1.0540%]
Change within noise threshold.
Benchmarking crc32c_append_4kb/crc32c_append_4kb: Collecting 100 samples in estimated 5.0017 s (7.6M itercrc32c_append_4kb/crc32c_append_4kb
time: [655.65 ns 656.51 ns 657.31 ns]
change: [+2.0735% +2.2570% +2.4860%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
Benchmarking crc32c_combine_4kb/crc32c_combine_4kb: Collecting 100 samples in estimated 5.0200 s (348k itcrc32c_combine_4kb/crc32c_combine_4kb
time: [13.967 µs 14.046 µs 14.130 µs]
change: [+1.0771% +1.6551% +2.2093%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
Benchmarking crc32c_combine_megabyte/crc32c_combine_megabyte: Collecting 100 samples in estimated 5.1351 crc32c_combine_megabyte/crc32c_combine_megabyte
time: [39.069 µs 39.110 µs 39.155 µs]
change: [-1.5957% -1.4196% -1.2390%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
7 (7.00%) high mild
2 (2.00%) high severe
Benchmarking crc32c_append_megabyte/crc32c_append_megabyte: Collecting 100 samples in estimated 5.1265 s crc32c_append_megabyte/crc32c_append_megabyte
time: [144.83 µs 145.04 µs 145.27 µs]
change: [-1.3535% -0.9236% -0.5653%] (p = 0.00 < 0.05)
Change within noise threshold.
benchmarking on powerpc BE with cross
produces odd results: performance improves with the fix. i don't know enough to understand why this is the case (probably is related to however podman
, in my case, simulates or emulates other architectures). i'll try this in a vm.
$ cross bench --target powerpc-unknown-linux-gnu --bench rand -- --baseline master --noise-threshold 0.025
Benchmarking crc32_update_megabytes/crc32_update_megabytes: Collecting 100 samples in estimated 9.5312 s crc32_update_megabytes/crc32_update_megabytes
time: [1.0317 ms 1.0390 ms 1.0450 ms]
thrpt: [912.65 MiB/s 917.85 MiB/s 924.38 MiB/s]
change:
time: [-29.906% -28.977% -28.128%] (p = 0.00 < 0.05)
thrpt: [+39.137% +40.799% +42.665%]
Performance has improved.
crc32c_8kb/crc32c_8kb time: [7.3929 µs 7.4035 µs 7.4150 µs]
thrpt: [1.0289 GiB/s 1.0305 GiB/s 1.0320 GiB/s]
change:
time: [-35.376% -35.269% -35.155%] (p = 0.00 < 0.05)
thrpt: [+54.213% +54.485% +54.742%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking crc32c_append_4kb/crc32c_append_4kb: Collecting 100 samples in estimated 5.0021 s (1.3M itercrc32c_append_4kb/crc32c_append_4kb
time: [3.7204 µs 3.7256 µs 3.7309 µs]
change: [-34.453% -34.356% -34.255%] (p = 0.00 < 0.05)
Performance has improved.
Benchmarking crc32c_combine_4kb/crc32c_combine_4kb: Collecting 100 samples in estimated 5.2278 s (91k itecrc32c_combine_4kb/crc32c_combine_4kb
time: [57.219 µs 57.307 µs 57.397 µs]
change: [-0.9196% -0.7205% -0.5254%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking crc32c_combine_megabyte/crc32c_combine_megabyte: Collecting 100 samples in estimated 5.1209 crc32c_combine_megabyte/crc32c_combine_megabyte
time: [99.908 µs 100.18 µs 100.44 µs]
change: [-0.8661% +0.1591% +2.1066%] (p = 0.86 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking crc32c_append_megabyte/crc32c_append_megabyte: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
Benchmarking crc32c_append_megabyte/crc32c_append_megabyte: Collecting 100 samples in estimated 5.8660 s crc32c_append_megabyte/crc32c_append_megabyte
time: [1.0277 ms 1.0314 ms 1.0343 ms]
change: [-29.787% -28.993% -28.179%] (p = 0.00 < 0.05)
Performance has improved.
EDIT: there's no point in benchmarking BE changes because there's no point in using this library on BE before the fix, since it produces incorrect results
Anything blocking this from being merged?
@crisidev Apologies, this didn't make it onto my radar. I can give this a review.
Absolutely no worries and thanks for looking into this!