buzhash64e: with rolling hash encryption
Rather simple and very slow encryption code, needs to be tuned later IF we find this worthwhile doing.
Note:
- the encryption of a buzhash64 current sum value must only depend on the sum value and the encryption key, NOT on the position in the current stream (thus: no AES-CTR or any other cipher that does not independently encrypt blocks, this is why ECB is used here) - otherwise it would cut the chunks not only based on the content but also based on position in the stream and that would cut differently and would destroy deduplication.
- instead of encrypting just 64bit using ECB, we could collect e.g. 1024 sums, encrypt them all on in one
.encrypt_many()call and then do another loop over the esums to decide where to cut. would be much less overhead than now, but needs maintaining yet another buffer efficiently.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 81.43%. Comparing base (5aa536d) to head (e7ad355).
:warning: Report is 1 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #8920 +/- ##
=======================================
Coverage 81.43% 81.43%
=======================================
Files 77 77
Lines 13515 13516 +1
Branches 2004 2004
=======================================
+ Hits 11006 11007 +1
Misses 1853 1853
Partials 656 656
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
(As a sidenote on speed). I've found it quite interesting that fast PCG RNG (numpy's default prng) is predictable, but seed reconstruction out of 64 consecutive values is a non-trivial operation — 2¼ CPU-years of computation. However, it's still just a €108, if we assume a CPU core to cost 4 €/month.