krux icon indicating copy to clipboard operation
krux copied to clipboard

Krux encryption

Open jdlcdl opened this issue 8 months ago • 5 comments

note: While this PR is in draft form, I will continue to rebase atop develop and force-push, so please excuse me for doing so, with the assumption that this is not yet public until it is marked ready for review.

What is this PR for?

To solidify krux encryption:

  • in preparation for more general encryption of content like user-defined-strings, xpubs, descriptors, maybe even psbts.
  • towards a better defined API to encourage use by others. note: some forks (earthdiver, 3rditeration) already do.

Changes made to:

  • [x] Code
  • [x] Tests
  • [ ] Docs
  • [ ] CHANGELOG

What is the purpose of this pull request?

  • [ ] Bug fix
  • [x] New feature
  • [ ] Docs update
  • [x] Other

jdlcdl avatar Mar 19 '25 20:03 jdlcdl

Codecov Report

Attention: Patch coverage is 97.96512% with 7 lines in your changes missing coverage. Please review.

Project coverage is 95.62%. Comparing base (0a9aa35) to head (50969c4).

Files with missing lines Patch % Lines
src/krux/encryption.py 95.23% 3 Missing :warning:
src/krux/kef.py 98.84% 3 Missing :warning:
src/krux/pages/login.py 88.88% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #546      +/-   ##
===========================================
+ Coverage    95.56%   95.62%   +0.05%     
===========================================
  Files           76       77       +1     
  Lines         8729     8952     +223     
===========================================
+ Hits          8342     8560     +218     
- Misses         387      392       +5     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Mar 19 '25 20:03 codecov[bot]

Following (To include into SS fork)

3rdIteration avatar Mar 30 '25 23:03 3rdIteration

Just a note that I'll likely end up backing-out changes in the previous commit (b761d26 until rebase) because NUL padding combined with authentication doesn't seem solvable to ensure decryption, for sure it's not "simple". I'm leaving it for now and exploring tests that show an exact failure rate (it's at least 1/256 existing encrypted-mnemonics that would require special handling) as well as ways to do special handling where false-positive authenticated decryption remains a possibility ~~(but a very rare possibility)~~ as rare as the checksum implies.

jdlcdl avatar Apr 01 '25 11:04 jdlcdl

...and exploring tests that show an exact failure rate

~~Latest commit~~ The latest commits have non-"simple" handling of authenticated decryption for unsafe-padding (ECB/CBC versions 0, 1, 3, 4. ~~I'm unable to find a solution for similar with GCM~~), as well as a test that will report "failures" (defined as failed to decrypt AND we didn't avoid encryption OR KEF-encoding failed).

The report for a total of 1.4M+ samples per version (each has 7 different types of aligned plaintext + 7 of un-aligned plaintext; w/ samples variable in the test set at 100K) is below.

1st attempt at this report (commit: "adjust to support legacy v0/v1 to be handled like new versions...")
KEF  Version      Timid   Avoid    Fail   Samples
  0  AES-ECB     203129       0       0   1000000
  1  AES-CBC       3907       0       0   1000000
  2  AES-GCM          0    3868       0   1000000
  3  AES-ECB v2  203114       0       0   1000000
  4  AES-CBC v2    3868       0       0   1000000
  5  AES-GCM +p       0       0       0   1000000
  6  AES-ECB +p  200006       0       0   1000000
  7  AES-CBC +p       0       0       0   1000000
  8  AES-GCM +c       0       0       0   1000000
  9  AES-ECB +c       0       0       0   1000000
 10  AES-CBC +c       0       0       0   1000000
2nd attempt at this report (commit: "TIL: mode GCM doesn't ever require padding...")
KEF  Version      Timid   Avoid    Fail   Samples
  0  AES-ECB     224594       0       0   1400000
  1  AES-CBC       5464       0       0   1400000
  2  AES-GCM          0       0       0   1400000
  3  AES-ECB v2  223093       0       0   1400000
  4  AES-CBC v2    3980       0       0   1400000
  5  AES-ECB +p  219979       0       0   1400000
  6  AES-CBC +p       0       0       0   1400000
  7  AES-GCM +c       0       0       0   1400000
  8  AES-ECB +c       0       0       0   1400000
  9  AES-CBC +c       0       0       0   1400000
Failure Summary:
Ver  Ver Name     Timid   Avoid    Fail  KEFerr   Samples
  0  AES-ECB     237996       0       0       0   1400000
  1  AES-CBC       5459       0       0       0   1400000
  2  AES-GCM          0       0       0       0   1400000
  3  AES-ECB v2  236591       0       0       0   1400000
  4  AES-CBC v2    4166       0       0       0   1400000
  5  AES-ECB +p  233408       0       0       0   1400000
  6  AES-CBC +p       0       0       0       0   1400000
  7  AES-GCM +c       0       0       0       0   1400000
  8  AES-ECB +c       0       0       0       0   1400000
  9  AES-CBC +c       0       0       0       0   1400000

Per-Version Failure Details:
Ver  Function    Count  Description
  0  encrypt    232537  ValueError('Duplicate blocks in ECB mode')
  0  encrypt      5459  ValueError('Cannot validate decryption for this plaintext')
  1  encrypt      5459  ValueError('Cannot validate decryption for this plaintext')
  3  encrypt    232425  ValueError('Duplicate blocks in ECB mode')
  3  encrypt      4166  ValueError('Cannot validate decryption for this plaintext')
  4  encrypt      4166  ValueError('Cannot validate decryption for this plaintext')
  5  encrypt    233408  ValueError('Duplicate blocks in ECB mode')

Types of plaintext messages, all via deterministic hashes, are:

  • 16 byte as-if 12w entropy, + re-encoded utf8
  • 32 byte as-if 24w entropy, + re-encoded utf8
  • 12w mnemonic, + re-encoded utf8 -- a repeat since already utf8
  • 24w mnemonic, + re-encoded utf8 -- a repeat since already utf8
  • 32 bytes where 1st and 2nd block are same, + re-encoded utf8
  • 64 bytes, + re-encoded utf8
  • all of above concatenated, + re-encoded utf8

Columns

  • Timid: avoided encryption but were actually capable of authenticated decryption
  • Avoid: avoided encryption, fortunately, because also failed authenticated decryption
  • Fail: didn't avoid encryption and FAILed authenticated decryption
  • KEFerr: FAILed successful round-trip of KEF encoding

In the report:

  • Timid values for ECB are raising-error on .encrypt() because 1) plaintext or auth bytes end in 0x00 and unsafe padding, and 2) repeated aes-blocks which reveal "repeated" because ECB ciphertext repeats -- but in all cases we were able to .decrypt() (w/ internal retries).
  • Timid values for CBC are raising-error on .encrypt() because plaintext or auth bytes end in 0x00 and unsafe padding, but in all cases we were able to .decrypt() (w/ internal retries).
  • ~~Avoid values for GCM are raising-error on encrypt because plaintext ends in 0x00, which is fortunate because "authenticated" decryption failed also.~~ There are no Avoid values for GCM because TIL: GCM doesn't require padding and has been defined to have None -- which means that "AES-GCM +p" is no longer a proposed version.

Note: versions 5-9, with +p or +c use safe padding (+c uses compression -- which likely disguises block repeats in ECB)

jdlcdl avatar Apr 02 '25 12:04 jdlcdl

A new function suggest_versions() decides on the KEF version to use when encrypting. This includes whether or not to use compression of plaintext.

It's currently using a threshold of 160 bytes with some exceptions, but I've tried some different bytestring samples to search for a better threshold. Results are below.

thresh: 192 for 192b content: b'\x1f@\xfc\x92\xda$\x16\x94u\ty\xeel\xf5\x82\xf2\xd5\xd7\xd2\x8e\x183]\xe0Z\xbcT\xd0V\x0e\x0fS\x02\x86\x0ce+\xf0\x8dV\x02R\xaa^t!\x05F\xf3i'
thresh: 79 for 287b content: b'b43:6X5:TWF835+69CT:6E9*N3Z5-TKLCKEBV.62XRO+5NSD$S'
thresh: 112 for 266b content: b'b58:HNRPiyKyPzubRZTXtxEXigjzQiPfzNx9Ceoa9xAUzw6jgf'
thresh: 97 for 260b content: b'b64:H0D8ktokFpR1CXnubPWC8tXX0o4YM13gWrxU0FYOD1MChg'
thresh: 126 for 424b content: b'wsh(sortedmulti(2,[d63dc4a7/48h/1h/0h/2h]tpubDEXCv'
thresh: 87 for 629b content: b'b43:U3VW:FB$5J208:+ALM6IYU43X6UUPI:WTL-YK19*70H8:.'
thresh: 111 for 583b content: b'b58:YigQL3LYRA8K123rmiXLNBE2M7HCyyyD1pUZha3j9q4BPo'
thresh: 58 for 572b content: b'b64:d3NoKHNvcnRlZG11bHRpKDIsW2Q2M2RjNGE3LzQ4aC8xaC'
thresh: 25 for 13115b content: b'abandon ability able about above absent absorb abs'
thresh: 84 for 19340b content: b'b43:3ID.D53$41H9GFXH/YLS-I.ED.WQB7197A9L5EL*TM-N-1'
thresh: 110 for 17915b content: b'b58:5VHpEs12tM3J4nVMeJfEyWEKeAVSS713bQTXPU65BwdJgA'
thresh: 42 for 17492b content: b'b64:YWJhbmRvbiBhYmlsaXR5IGFibGUgYWJvdXQgYWJvdmUgYW'

In the above analysis, a threshold was considered "better" if compressing the sample of this size resulted in fewer bytes than not compressing it. It did not take into account processing overhead of compression, only on whether or not there was a "size" win.

For random bytes (like mnemonic entropy), compression is never smaller. For english words, there is benefit for really short strings, ie: 25 in the case of the bip39 words list. For others, 80 to 120 seems like a much better "compress" threshold than 160.

suggest_versions() is called with the "plaintext" bytes and the user's AES "mode" preference, so it is possible to make an intelligent per-plaintext decision. On the other hand, assuming that this is only used for plaintext that the user considers a sensitive secret, I want to remain careful about how much analysis is performed on their secrets prior to encryption (other than absolutely necessary like checking to see if repeat blocks in mode AES-ECB) .

No responses necessary, just sharing these thoughts.

jdlcdl avatar Apr 04 '25 16:04 jdlcdl

As simulator created (Crypto.Cipher.aes MODE_CTR) controls since MODE_CTR has arrived in MaixPy.ucryptolib:

These can be read from Tools/Datum Tool in "kef_ui_prototype" branch


"im sixteen bytes", version: "AES-CTR", key: "k", encoding: "base43" im-sixteen-bytes_k


"2of3 multisig descriptor", version: "AES-CTR +c", key: "k", encoding: "base43" 2of3-descr_k


12-layer Matryoshka, a text message wrapped in consecutive envelopes for every version supported. All keys are "abc", encoding: "binary" matryoshka_abc

jdlcdl avatar May 15 '25 02:05 jdlcdl