faster-hex icon indicating copy to clipboard operation
faster-hex copied to clipboard

Various changes (hopefully improvements)

Open ggriffiniii opened this issue 5 years ago • 9 comments

Here is a rather large change. This makes changes to the public facing API as well as the internal implementation. The public API changes are to remove the hex_ from function names and reduce what's exported by default. The only public functions exposed by default are now

encode, encode_to_slice, decode, decode_to_slice, decode_to_slice_unchecked

Each of these now also take a generic input of anything that is AsRef<[u8]> to be more convenient for users to encode/decode strings as well as byte buffers. The various fallback and specialized implementations are only exposed when compiled with the "bench" feature. The benchmarks now require that feature to run.

When tested on my machine which is a Intel(R) Xeon(R) W-2135 (skylake) these changes result in the following benchmark results:

encode/faster_hex/2     time:   [21.387 ns 21.457 ns 21.609 ns]
                        thrpt:  [88.265 MiB/s 88.891 MiB/s 89.184 MiB/s]
                 change:
                        time:   [-22.640% -22.215% -21.653%] (p = 0.00 < 0.05)
                        thrpt:  [+27.638% +28.559% +29.265%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/16    time:   [40.341 ns 40.623 ns 41.236 ns]
                        thrpt:  [370.03 MiB/s 375.62 MiB/s 378.25 MiB/s]
                 change:
                        time:   [-17.997% -16.616% -15.447%] (p = 0.00 < 0.05)
                        thrpt:  [+18.268% +19.928% +21.947%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/32    time:   [40.945 ns 41.027 ns 41.178 ns]
                        thrpt:  [741.12 MiB/s 743.84 MiB/s 745.32 MiB/s]
                 change:
                        time:   [-16.032% -15.482% -14.929%] (p = 0.00 < 0.05)
                        thrpt:  [+17.549% +18.318% +19.093%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

encode/faster_hex/128   time:   [52.797 ns 52.941 ns 53.245 ns]
                        thrpt:  [2.2389 GiB/s 2.2517 GiB/s 2.2579 GiB/s]
                 change:
                        time:   [-13.436% -12.674% -12.073%] (p = 0.00 < 0.05)
                        thrpt:  [+13.730% +14.514% +15.521%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

encode/faster_hex/4096  time:   [270.41 ns 270.99 ns 271.79 ns]
                        thrpt:  [14.036 GiB/s 14.077 GiB/s 14.107 GiB/s]
                 change:
                        time:   [-16.646% -16.256% -15.877%] (p = 0.00 < 0.05)
                        thrpt:  [+18.873% +19.412% +19.971%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

decode/faster_hex/2     time:   [4.5752 ns 4.6051 ns 4.6524 ns]
                        thrpt:  [409.97 MiB/s 414.18 MiB/s 416.89 MiB/s]
                 change:
                        time:   [-39.299% -38.642% -37.981%] (p = 0.00 < 0.05)
                        thrpt:  [+61.241% +62.977% +64.741%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

decode/faster_hex/16    time:   [12.561 ns 12.574 ns 12.595 ns]
                        thrpt:  [1.1831 GiB/s 1.1851 GiB/s 1.1863 GiB/s]
                 change:
                        time:   [-11.740% -11.124% -10.172%] (p = 0.00 < 0.05)
                        thrpt:  [+11.324% +12.516% +13.301%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

decode/faster_hex/32    time:   [7.5830 ns 7.6135 ns 7.6552 ns]
                        thrpt:  [3.8931 GiB/s 3.9144 GiB/s 3.9302 GiB/s]
                 change:
                        time:   [-66.986% -66.902% -66.800%] (p = 0.00 < 0.05)
                        thrpt:  [+201.21% +202.13% +202.90%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

decode/faster_hex/128   time:   [9.6518 ns 9.6774 ns 9.7188 ns]
                        thrpt:  [12.266 GiB/s 12.318 GiB/s 12.351 GiB/s]
                 change:
                        time:   [-52.470% -51.723% -51.184%] (p = 0.00 < 0.05)
                        thrpt:  [+104.85% +107.14% +110.39%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

decode/faster_hex/4096  time:   [168.69 ns 169.12 ns 169.65 ns]
                        thrpt:  [22.485 GiB/s 22.556 GiB/s 22.613 GiB/s]
                 change:
                        time:   [-65.925% -65.422% -65.072%] (p = 0.00 < 0.05)
                        thrpt:  [+186.30% +189.20% +193.47%]
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe

decode/faster_hex_unchecked/2
                        time:   [3.8828 ns 3.8839 ns 3.8851 ns]
                        thrpt:  [490.94 MiB/s 491.09 MiB/s 491.23 MiB/s]
                 change:
                        time:   [-12.488% -11.426% -10.756%] (p = 0.00 < 0.05)
                        thrpt:  [+12.052% +12.900% +14.271%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

decode/faster_hex_unchecked/16
                        time:   [10.755 ns 10.785 ns 10.845 ns]
                        thrpt:  [1.3740 GiB/s 1.3817 GiB/s 1.3856 GiB/s]
                 change:
                        time:   [-0.2600% +0.0841% +0.4912%] (p = 0.74 > 0.05)
                        thrpt:  [-0.4888% -0.0840% +0.2607%]
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

decode/faster_hex_unchecked/32
                        time:   [5.3663 ns 5.3761 ns 5.3883 ns]
                        thrpt:  [5.5310 GiB/s 5.5435 GiB/s 5.5536 GiB/s]
                 change:
                        time:   [-70.582% -70.458% -70.349%] (p = 0.00 < 0.05)
                        thrpt:  [+237.25% +238.50% +239.92%]
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

decode/faster_hex_unchecked/128
                        time:   [6.4211 ns 6.4313 ns 6.4486 ns]
                        thrpt:  [18.486 GiB/s 18.536 GiB/s 18.565 GiB/s]
                 change:
                        time:   [-17.437% -16.443% -15.654%] (p = 0.00 < 0.05)
                        thrpt:  [+18.559% +19.679% +21.119%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

decode/faster_hex_unchecked/4096
                        time:   [101.75 ns 102.60 ns 103.91 ns]
                        thrpt:  [36.712 GiB/s 37.179 GiB/s 37.492 GiB/s]
                 change:
                        time:   [-33.690% -33.365% -32.935%] (p = 0.00 < 0.05)
                        thrpt:  [+49.110% +50.072% +50.806%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low severe
  2 (2.00%) high mild
  6 (6.00%) high severe

ggriffiniii avatar Feb 18 '20 17:02 ggriffiniii

Great work @ggriffiniii ! I took a quick pass over it, but it's a big PR. so I need more time to do a proper review.

zhangsoledad avatar Feb 19 '20 03:02 zhangsoledad

I'd propose to use AsMut make public api more generic, like this

pub fn encode<S, D>(src: &S, dst: &D) -> Result<(), Error>
where
    S: AsRef<[u8]> + ?Sized,
    D: AsMut<[u8]> + ?Sized,
{
pub fn encode_to_string<I>(src: &I) -> String
where
    I: AsRef<[u8]> + ?Sized,
{

zhangsoledad avatar Feb 19 '20 03:02 zhangsoledad

Yeah, this is a rather large change. By all means take as much time as you need to do a thorough review.

I'd propose to use AsMut make public api more generic, like this

pub fn encode<S, D>(src: &S, dst: &D) -> Result<(), Error>
where
    S: AsRef<[u8]> + ?Sized,
    D: AsMut<[u8]> + ?Sized,
{
pub fn encode_to_string<I>(src: &I) -> String
where
    I: AsRef<[u8]> + ?Sized,
{

It could be changed to AsMut easily enough, but it doesn't seem worth it to me. The big benefit of AsRef<[u8]> is being able to accept &str and &String as well as byte slices and Vec's. The only standard things that implement AsMut are byte slices, Vec's and Box<[u8]>, which already will auto-deref to &mut [u8] without the explicit AsMut bound.

I also favored encode and decode allocating, and having encode_to_slice and decode_to_slice being the more verbose options for those seeking to reuse allocations. The latter is far less common so making it slightly more verbose makes sense and also follows a similar convention to other encoding crates like hex and base64.

ggriffiniii avatar Feb 19 '20 05:02 ggriffiniii

While on the topic of public function signatures, I thought it would be useful to add a couple additional ideas I had. I opted to not include them in this change just to keep the already large change from growing even further, but here it goes.

I'll outline the proposals for encoding to keep the focus on the ideas, but in each case there is a matching analog for decoding that should probably be included if you want to pursue it.

Provide a convenient return value for encode_to_slice

pub fn encode_to_slice<'i, 'o, I>(src: &'i I, dst: &'o mut [u8]) -> Result<&'o str, Error>
where
    I: AsRef<[u8]> + ?Sized;

The goal here would just to provide a convenient &str of the encoded data. Users would be free to ignore the return value and the function would behave identically as it does now, but for many users getting back a &str (that's already been validated as utf8) could be a nice convenience.

Provide a simpler api for buffer reuse

pub fn encode_with_buffer<'i, 'o, I>(src: &'i I, dst: &'o mut Vec<u8>) -> &'o str
where
    I: AsRef<[u8]> + ?Sized;

or

pub fn encode_with_buffer<I, B>(src: &I, dst: B) -> String
where
    I: AsRef<[u8]> + ?Sized,
    B: Into<Vec<u8>>;

The goal with both of these options is to allow users to reuse buffers (allocations) when encoding multiple chunks of data. That's the biggest performance benefit allowed by the to_slice functions, but these would allow for the same performance characteristics without the potential errors since the size of the output buffer would be managed internally. Both of these options would be simple wrappers around encode_to_slice.

These could all be pursued as subsequent PR's, but we could have the discussion here while the topic of public API is already being discussed. The buffer reuse would be semver compatible change that could be added at any time, the return value change to encode_to_slice would not be semver compatible so a decision on that should be made prior to making another release on crates.io.

ggriffiniii avatar Feb 19 '20 18:02 ggriffiniii

I like directions of changes in this PR. Crate itself is great, I found it when noticed how hex is slow. But I found that documentation and serde module are missed here. Newcomers can prefer hex, because it's clear how use it and ready for usage with serde. Any plans on this? I think I'd like work on some such code when this PR will be merged.

fanatid avatar Apr 15 '20 18:04 fanatid

Looks like proptest should be bumped (rustc 1.44.0-nightly (94d346360 2020-04-09)):

   Compiling proptest v0.8.7
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter `'a` due to conflicting requirements
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:23
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                       ^^^^^^^^^^^^^^^^
   |
note: first, the lifetime cannot outlive the lifetime `'a` as defined on the impl at 35:6...
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:35:6
   |
35 | impl<'a, T: 'static + Clone, A: fmt::Debug + 'static + Iterator<Item = &'a T>>
   |      ^^
note: ...so that the types are compatible
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:23
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                       ^^^^^^^^^^^^^^^^
   = note: expected  `&'a T`
              found  `&T`
   = note: but, the lifetime must be valid for the static lifetime...
note: ...so that the type `strategy::map::Map<S, fn(A) -> core::iter::Cloned<A> {<A as core::iter::Iterator>::cloned::<'_, T>}>` will meet its required lifetime bounds
  --> /home/kirill/.cargo/registry/src/github.com-1ecc6299db9ec823/proptest-0.8.7/src/arbitrary/_core/iter.rs:44:41
   |
44 |         base.prop_map(Iterator::cloned).boxed()
   |                                         ^^^^^

With update proptest to 0.9.6 errors gone.

fanatid avatar Apr 15 '20 18:04 fanatid

Hey y'all, any updates on this PR?

doitian avatar Oct 16 '22 10:10 doitian

This PR now stuck point is not timely processing, we can accept the algorithm changes, but for the API we do not want to break the original API now directly, may add new API rather than directly overwrite the original is more acceptable.

zhangsoledad avatar Oct 17 '22 03:10 zhangsoledad

Please break the API. You could have the defacto hex encoding crate as hex seems unmaintained and the API will need to be improved eventually.

dsheets avatar Dec 01 '23 17:12 dsheets