s2n-tls
s2n-tls copied to clipboard
Replace memcmp to s2n_constant_time_equals
Resolved issues:
Solving issue #3062
Description of changes:
Change most of memcmp usages to s2n_constant_time_equals.
There are two parts that weren't change, which are in s2n_cipher_suits.c (line 1110) and s2n_config.c (line 323). Both of those parts use the return integer from memcmp which s2n_constant_time_equals can't do.
Regression Tests Results
We recently added new regression tests that lets us know the instruction count differences over some common codepaths.
--------------------------------------------------------------------------------
-- Summary
--------------------------------------------------------------------------------
Ir____________
1,707 (100.0%) PROGRAM TOTALS
--------------------------------------------------------------------------------
-- File:function summary
--------------------------------------------------------------------------------
Ir____________________ file:function
< 2,190 (128.3%, 128.3%) /home/ubuntu/workspace/s2n-tls/bindings/rust/s2n-tls-sys/lib/utils/s2n_safety.c:s2n_constant_time_equals
< -384 (-22.5%, 105.8%) ./string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe
< -62 (-3.6%, 102.2%) /rust/deps/hashbrown-0.14.5/src/raw/mod.rs:
-58 (-3.4%) hashbrown::map::HashMap<K,V,S,A>::insert
-4 (-0.2%) core::iter::adapters::try_process
< -34 (-2.0%, 100.2%) /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:
-30 (-1.8%) hashbrown::map::HashMap<K,V,S,A>::insert
-2 (-0.1%) hashbrown::raw::RawTable<T,A>::reserve_rehash
-2 (-0.1%) core::iter::adapters::try_process
This shows us that there is about a 1,707 instruction count penalty for shifting to s2n_constant_time_equals. The main differences are the additional 2,190 instruction from s2n_constant_time_equals and the 384 instruction count reduction from memcmp. Relative to the 79,292,794 instructions for a complete RSA handshake, this is a 0.00215 % regression in handshake performance.
Note: The hashbrown instruction count differences are due to randomness between runs.
Handshake Benchmarks
We also have benchmarks covering handshake and throughput performance. Comparing these benchmarks between this PR and mainline we see the following differences.
handshake-rsa2048/s2n-tls
time: [1.0697 ms 1.0702 ms 1.0707 ms]
change: [-0.8251% +0.1691% +1.2108%] (p = 0.78 > 0.05)
No change in performance detected.
handshake-rsa3072/s2n-tls
time: [2.9331 ms 2.9364 ms 2.9396 ms]
change: [-0.1189% +0.0360% +0.1889%] (p = 0.65 > 0.05)
No change in performance detected.
handshake-rsa4096/s2n-tls
time: [6.2232 ms 6.2324 ms 6.2420 ms]
change: [-0.1699% +0.0467% +0.2659%] (p = 0.68 > 0.05)
No change in performance detected.
handshake-ecdsa384/s2n-tls
time: [1.1160 ms 1.1162 ms 1.1164 ms]
change: [-0.1847% +0.0291% +0.2434%] (p = 0.81 > 0.05)
No change in performance detected.
handshake-ecdsa256/s2n-tls
time: [398.52 µs 398.62 µs 398.74 µs]
change: [+0.3657% +0.7706% +1.1642%] (p = 0.00 < 0.05)
Change within noise threshold.
throughput-AES_128_GCM_SHA256/s2n-tls
time: [108.28 µs 108.70 µs 109.14 µs]
thrpt: [873.83 MiB/s 877.37 MiB/s 880.71 MiB/s]
change:
time: [+0.8096% +1.3883% +1.9335%] (p = 0.00 < 0.05)
thrpt: [-1.8968% -1.3693% -0.8031%]
Change within noise threshold.
throughput-AES_256_GCM_SHA384/s2n-tls
time: [118.10 µs 118.55 µs 119.01 µs]
thrpt: [801.35 MiB/s 804.47 MiB/s 807.49 MiB/s]
change:
time: [-1.3449% -0.9600% -0.5646%] (p = 0.00 < 0.05)
thrpt: [+0.5678% +0.9693% +1.3632%]
As expected, we do not see any significant changes. While there are some slight differences reported, these are below the accuracy threshold of the benchmarks.
s2n_constant_time_equals vs memcmp Benchmarks
We ran criterion benchmarks comparing the performance of memcmp
against s2n_constant_time_equals
.
- small data, not equal: how long does it take to compare an unequal blob of 255 bytes?
- small data, equal: how long does it take to compare an equal blob of 255 bytes?
These cases give numbers for the common case of comparing small pieces of data.
- many small blobs, not equal: how long does it take to compare 16 kB of 255 byte blobs.
This is a representation of a pathological case possible in places like s2n_protocol_preferences.c. Generally there would be only a small number of protocols, but it is possible for a client to send up to 16 kB of preferences, where each protocol has a maximum size of 255 bytes. This case gives us an understanding of the performance impact in this worst case scenario.
# 386 ps vs 130,280 ps
low-level-comparison - small data, not equal/memcmp
time: [386.08 ps 386.10 ps 386.12 ps]
low-level-comparison - small data, not equal/s2n_constant_time_equals
time: [130.28 ns 130.28 ns 130.29 ns]
# 386 ps vs 130,300 ps
low-level-comparison - small data, equal/memcmp
time: [386.07 ps 386.09 ps 386.11 ps]
low-level-comparison - small data, equal/s2n_constant_time_equals_c
time: [130.29 ns 130.30 ns 130.31 ns]
# 519 ns vs 30,593 ns
low-level-comparison - many small blobs, not equal/memcmp
time: [519.23 ns 519.37 ns 519.50 ns]
low-level-comparison - many small blobs, not equal/s2n_constant_time_equals
time: [30.592 µs 30.593 µs 30.595 µs]
We see that while s2n_constant_time_equals is significantly slower than memcmp, it remains a very fast function, and its cost is small relative to the cost of an entire handshake (120ns / 1_000_000ns => .0000012% ).
For pathological cases (comparing an entire 16kb extension) the s2n_const_time_equals would represent a noticeable portion of the handshake (30us / 1_000us => 3%). Therefore we avoid the usage of s2n_constant_time_equals in scenarios where largeish data (> 1kB) would be compared.
Call-outs:
Needs to make a note to those two parts that aren't changed.
Testing:
This change is a refactor change. The code ran and passed all test cases via cmake.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.