speckle-sharp icon indicating copy to clipboard operation
speckle-sharp copied to clipboard

CNX-9158 - Replace `BinaryFormatter` with UTF-8 encoded bytes for SHA256 hash function

Open JR-Morgan opened this issue 9 months ago • 3 comments

I have this as an alternative to #3282

Unlike #3282, this PR does introduce a breaking change, in that the same input will yield a different output.

This is because (as described by @secretlyagoblin the BinaryFormatter gives us more than just the char[] bytes, but also several bytes of type information) With this implementation, I've aligned the hashing behaviour with our Python SDK, which simply gets the UTF-8 encoded bytes.

I think we should consider accepting this breaking change, the implementation here is much simpler than having to patch in prefix as specified here https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-nrbf/eb503ca5-e1f6-4271-a7ee-c4ca38d07996, is already exactly what we're doing in Python SDK, and means there is less code for us to maintain.


I have benchmarked it against our current implementation, and it is a tad faster with both "small" and "long" data sets.

Method input Mean Error StdDev Gen0 Gen1 Allocated
BinaryFormatter long(...)8o7y [471] 1,463.3 ns 18.65 ns 16.54 ns 0.2975 0.0019 4992 B
Utf8 long(...)8o7y [471] 956.4 ns 10.23 ns 9.57 ns 0.1259 - 2112 B
BinaryFormatter small(...)0sdfa [50] 1,199.7 ns 20.45 ns 18.13 ns 0.2403 - 4032 B
Utf8 small(...)0sdfa [50] 703.1 ns 8.39 ns 7.01 ns 0.1011 - 1696 B

The remaining thing for us to decide, are we happy to make this breaking change (maybe in a 2.19 release) The biggest effect of this is diffing in the viewer will not be useful between prior and post change versions. Or, We can make this change in a later release where we can communicate a breaking change clearer (e.g. in the DUI3 connectors release). We can always publish a nuget in the interim to unblock the Net8 users who can't

JR-Morgan avatar May 01 '24 19:05 JR-Morgan

@secretlyagoblin I'd be curious about what you think about this option, your PR has the advantage that its non-breaking. But, If we decide a breaking change is acceptable, I think keeping things simple (and aligned with specklepy) would be a nice long term solution.

JR-Morgan avatar May 01 '24 19:05 JR-Morgan

For what it's worth, the "breaking" change part I discussed with @JR-Morgan today and we agreed to "take it on" affects only potential "fast" diffing based on hash + applicationId checks, like we do in our viewer. In other words, "the same input will yield a different output" will only apply on this swap - future ~~commits~~ versions will not suffer from this, and hashing will be consistent on a forward looking basis (even more so, considering specklepy).

(e.g., wall A has been sent previously, if it's going to be sent again it will get a new hash, hence confounding potential workflows.)

I am highly inclined to make this breaking change, as otherwise we increase surface area.

didimitrie avatar May 01 '24 20:05 didimitrie

@JR-Morgan, @didimitrie, it's your call, but if you haven't already, I'd suggest writing a policy on breaking changes to that you can refer to in these situations.

To be clear, I'm happy with either approach as I can't think of anything that this would break that couldn't be easily communicated (and our Rhino users will be moving to R8 with 2.19 anyway so there's some other stuff going on) and I'm in no way precious about the code that I've written.

Also if you're clear that you're trying it in the new pre-releases, you'd hopefully catch any edge-case power-users.

I will note that:

  • my PR increases the complexity of the codebase, but would be invisible to users of speckle,
  • the solution here simplifies the codebase, but does affect the end users of speckle,

So, it really comes down to your philosophy! Good luck :D

secretlyagoblin avatar May 01 '24 22:05 secretlyagoblin