bc-csharp icon indicating copy to clipboard operation
bc-csharp copied to clipboard

Add Blake2b intrinsics POC

Open TimothyMakkison opened this issue 2 years ago • 7 comments

Adapted @saucecontrol's Blake2Fast intrinsics code to work with the Konscious.Security.Cryptography library and thought it would work here.

The benchmark measured a 6x-13x speed up without additional memory usage. Blake2s or Blake2xs would probably experience similar benefits.

Non intrinsics

Method Size Mean Error StdDev Median Gen0 Allocated
GetHash 3 2.270 μs 0.0804 μs 0.2372 μs 2.337 μs 0.1106 176 B
GetHash 100 2.082 μs 0.0241 μs 0.0214 μs 2.083 μs 0.1106 176 B
GetHash 1000 15.085 μs 0.8234 μs 2.4279 μs 15.903 μs 0.0916 176 B
GetHash 10000 155.035 μs 2.8955 μs 4.1527 μs 155.360 μs - 176 B

Intrinsics

Method Size Mean Error StdDev Median Gen0 Allocated
GetHash 3 356.7 ns 11.23 ns 33.12 ns 367.6 ns 0.1121 176 B
GetHash 100 341.0 ns 6.96 ns 20.09 ns 343.6 ns 0.1121 176 B
GetHash 1000 1,439.0 ns 27.08 ns 22.61 ns 1,434.5 ns 0.1106 176 B
GetHash 10000 12,796.6 ns 259.71 ns 765.76 ns 13,144.9 ns 0.1068 176 B

Note that I'm new to intrinsics and CompilerServices.Unsafe, this shouldn't be considered complete or safe. Do you have a CI pipeline for testing intrinsic code?

TimothyMakkison avatar Nov 18 '22 23:11 TimothyMakkison

Great numbers, but suggest the following changes:

  • Move all intrinsics code to a separate file (Blake2b_X86).
  • Put intrinsics code under "#if NETCOREAPP3_0_OR_GREATER", which AFAIK is the correct minimal version.
  • Add a licensing comment if the code is a substantial copy of either of the mentioned projects.
  • Add IsSupported method to the new class and use that from Blake2bDigest
  • Extract round function into a method (AggressiveInlining is still fine)
  • Probably can replace most of the Unsafe methods with MemoryMarshal methods

At least add the license comment, the rest I can take care of if necessary.

peterdettman avatar Nov 19 '22 06:11 peterdettman

  • [x] Move all intrinsics code to a separate file (Blake2b_X86).
  • [x] Put intrinsics code under "#if NETCOREAPP3_0_OR_GREATER", which AFAIK is the correct minimal version.
  • [x] Add a licensing comment if the code is a substantial copy of either of the mentioned projects.
  • [ ] Add IsSupported method to the new class and use that from Blake2bDigest
  • [ ] Extract round function into a method (AggressiveInlining is still fine)
  • [x] Probably can replace most of the Unsafe methods with MemoryMarshal methods

Added Blake2s_X86.

Blake2s

Non intrinsics

Method Size Mean Error StdDev Median Gen0 Allocated
GetHash 3 1.872 μs 0.0935 μs 0.2758 μs 1.987 μs 0.1450 232 B
GetHash 100 3.686 μs 0.0737 μs 0.1523 μs 3.720 μs 0.1450 232 B
GetHash 1000 28.101 μs 0.5315 μs 1.0857 μs 28.234 μs 0.1221 232 B
GetHash 10000 204.365 μs 9.7001 μs 28.6010 μs 200.846 μs - 232 B

Intrinsics

Method Size Mean Error StdDev Gen0 Allocated
GetHash 3 347.4 ns 13.51 ns 39.85 ns 0.1478 232 B
GetHash 100 501.3 ns 12.88 ns 37.99 ns 0.1478 232 B
GetHash 1000 2,545.4 ns 38.75 ns 34.35 ns 0.1450 232 B
GetHash 10000 23,501.2 ns 465.85 ns 653.06 ns 0.1221 232 B

Blake2xs

Non intrinsics

Method Size Mean Error StdDev Gen0 Allocated
GetHash 3 2.306 μs 0.0315 μs 0.0295 μs 0.5684 896 B
GetHash 100 3.201 μs 0.0413 μs 0.0366 μs 0.5684 896 B
GetHash 1000 15.635 μs 0.1960 μs 0.1834 μs 0.5493 896 B
GetHash 10000 135.593 μs 1.6090 μs 1.5051 μs 0.4883 896 B

Intrinsics

Method Size Mean Error StdDev Median Gen0 Allocated
GetHash 3 1.072 μs 0.0193 μs 0.0198 μs 1.070 μs 0.5703 896 B
GetHash 100 1.091 μs 0.0307 μs 0.0906 μs 1.109 μs 0.5703 896 B
GetHash 1000 2.858 μs 0.1685 μs 0.4969 μs 3.035 μs 0.5684 896 B
GetHash 10000 16.398 μs 0.1364 μs 0.1675 μs 16.444 μs 0.5493 896 B

There was no overlap between Blake2b and Blake2s so I deleted VectorExtensions and inlined the methods.

Suggestions

Add a length check to DoFinal to ensure that output spans and arrays are long enough.

There are several repeat implementations of Store128, Load128, Store128_UInt32, perhaps a shared helper method would help.

TimothyMakkison avatar Nov 20 '22 18:11 TimothyMakkison

Note that intrinsics aren't available for NETSTANDARD2_1_OR_GREATER, so those directives should be removed (just NETCOREAPP3_0_OR_GREATER).

I'd still like methods on the _X86 classes, e.g.: public static bool IsSupported => Avx2.IsSupported && BitConverter.IsLittleEndian;

although ideally BitConverter.IsLittleEndian isn't a requirement and only affects Load/Store (this could fall out later when I factor out all the load/store methods and support bigendian).

Also if all the rounds have the same structure (or a small set thereof), please factor out a method for 1 round (using ref parameters where each round applies to different parts of the state).

I've added length checks that you suggested also (this mirror can take a few days to be updated though).

peterdettman avatar Nov 21 '22 06:11 peterdettman

  • [x] Note that intrinsics aren't available for NETSTANDARD2_1_OR_GREATER, so those directives should be removed (just NETCOREAPP3_0_OR_GREATER).
  • [x] I'd still like methods on the _X86 classes, e.g.:
public static bool IsSupported => Avx2.IsSupported && BitConverter.IsLittleEndian;

Also if all the rounds have the same structure (or a small set thereof), please factor out a method for 1 round (using ref parameters where each round applies to different parts of the state).

I've created a Round function containing the diagonalization and G calls, The remaining logic is loading the b0-b4 values, as far as I can tell this can't be fruther simplified as each round is subtly different.

I've added length checks that you suggested also (this mirror can take a few days to be updated though).

Thanks, it was driving me nuts when debugging the benchmarks

Should I update the xml documentation? It looks like the docs haven't been updated since being converted from java. Are you okay with using <inheritdoc />?

/// <inheritdoc />
public void Update(byte b)
{

TimothyMakkison avatar Nov 21 '22 15:11 TimothyMakkison

I've converted the javadocs to xml form. I've used <inheritdoc /> where possible and added summaries when the existing javadoc differed from the IDigest description.

Nitpick: I've noticed that a couple of functions such as Update(byte b) have different parameter names to the interface IDigest Update(byte input). I haven't altered the names as this would be a breaking change, but it might be worth updating them to match the interface.

TimothyMakkison avatar Nov 27 '22 22:11 TimothyMakkison

Is this ready for review/merge?

peterdettman avatar Dec 21 '22 05:12 peterdettman

I'll hold off for and convert this to a draft. Want to double check I haven't added a regression for .Net 3

TimothyMakkison avatar Dec 22 '22 10:12 TimothyMakkison