heed Expose the LMDB Encrypt/Decrypt feature

trafficstars

Call for Advice

As a reminder, heed is a wrapper on top of LMDB. LMDB is a memory-mapped key-value store battle-tested for a long time. This branch introduces a wrapper on top of the "recently" released support for page-level encryption. It has been released in the latest version of LMDB, and heed has been updated in #128. This PR will be merged in it after the reviews.

I would like help from people who know how to use encryption algorithms. Maximum feedback on the new encryption and checksum API would be appreciated. As I am not very proficient in cryptography, would it be possible to help me with these subjects:

How can I document every parameter of the Encrypt::encrypt_decrypt trait method?
Would it be possible to help me create a complex example that uses these parameters? ☝️
Would it be possible to help me create an example that uses the auth_size parameter of the EnvOpenOptions::encrypt_with method?
I added a piece of advice to the documentation of the encrypt_with method to set a checksum too, and to interpret a checksum error as a decryption error. Is it correct to do so?
Is there a more idiomatic way not to declare the Checksum and Encrypt traits when we don't need them in the EnvOpenOptions instead of having dummy types? (Would be so good when the never type is released).

Note that I already have successfully used this new API and exposed a simple example that uses the ChaCha20 crate. I would like to know if this example is valid, please.

You'll probably need the documentation of the API of LMDB. Unfortunately, the only one that is available only is outdated. I advise you to install and run Doxygen and generate the doc from lmdb-master3-sys/lmdb/libraries/liblmdb (don't forget to git submodule init --update when you clone this repo), or you can directly read the raw documentation from there too.

Where Should I Start?

To help me, you can simply create a new Rust crate, add this line to your dependencies and inspire yourself by looking at the simple example I linked above ☝️:

heed = { git = "https://github.com/meilisearch/heed", branch = "encdec-func" }

Once you've found improvements, please comment directly in this PR ❤️ 👨‍💻

What should I do next?

[x] Make the EnvOpenOptions::encrypt_with method accept a Key of the right length and let the user provide it by using its own KDF and salt. Heed must not know about the KDF used.
[x] Remove the Checksumming feature from this crate for now, we can introduce in another PR.
[x] Move the DummyEncrypt traits impls elsewhere e.g., end of the file, another module, to avoid polluting this part of the code.
[x] Check the features of the newly introduced dependencies to disable the useless ones.
[x] Review and rewrite the documentation of the new methods and modules added in this PR.
[x] Document the size of the nonce provided by LMDB. XChaCha20 needs a bigger nonce than the one provided but there is no need in using it with LMDB anyway.
[ ] We should probably remove the DummyEncrypt type and rename it NoopEncrypt to miror the #212 changes.

Sep 16 '22 19:09 Kerollmops

I see multiple issues with the current PR, mostly coming from the LMDB API.

TLDR: LMDB should probably expose an AEAD API instead of the current "handmade" API.

Unsoundness

Unwinding across FFI is UB, so this is unsound. The callback should be wrapped in a catch_unwind.

AEAD

Instead of rolling your own traits I'd recommend leveraging the existing RustCrypto ecosystem, especially the aead crate. What you'd probably be looking for are the AeadMutInPlace (using their detached methods), and the KeyInit trait to initialize with the key. GenericArray is not obvious to use but they implement TryFrom<&[u8]> for &GenericArray<…> and have an as_slice method. Take a generic parameter <A: KeyInit + AeadMutInPlace>.

AEAD stands for Authenticated Encryption with Associated Data, which is the standard API for authenticated data. It is much better to not separate the encryption from the authentication (checksum), as it leaves less room for error and also can improve performance. For crates implementing AEADs look at (X)ChaCh20Poly1305, AES-GCM and AES-GCM-SIV.

The example

The example is somewhat broken. CRC codes are not cryptographically secure. Even with the encryption, it may be possible for an attacker to modify the data so that the checksum passes and modify it subtly without it being noticeable. AEADs prevent that kind of issue :)

Limits of LMDB

The API proposed by LMDB separates the encryption from the authentication(checksum), which is bad as it prevents efficient usage of standard, misuse-resistant AEAD constructs. LMDB also performs the checksum before the actual encryption, which is very dangerous. Also, just looking at the code I wasn't able to find how the nonce (iv) is generated. If the same iv is used twice with the same key, it leads to the plaintext being recoverable and sometimes even the key itself. I'm worried that if the nonce is generated randomly, it could lead to nonce reuse, since the nonces of standard AEADs are rather small (96bits), and you reach 50% chances of a collision after $2^{48}$ encryption operations, which can happen for a long-lived database. In that cases either longer nonces should be used (the X variants of chacha20poly1305), or a nonce-reuse resistant AEAD should be used (aes-gcm-siv). Either way the source of the nonces should probably be documented.

It appears that the functionality is not yet stabilized (still a release candidate) so I'd recommend contacting them to change the encryption API to be that of an AEAD:

Two fallible functions, one for encryption and one for decryption.
- Encryption
  - In: Key, Plaintext, Nonce and Associated data
  - Out: Ciphertext and Authentication tag
- Decryption:
  - In: Key, Nonce, Ciphertext, Associated data, Tag
  - Out: Plaintext

The associated data could be whatever metadata LMDB needs but cannot encrypt. The AEAD ensures that is hasn't been tampered.

If needed a checksum could also be used, but it should be separate from the encryption (even maybe disallowed to be used at the same time) and used only to detect accidental data corruption.

Oct 31 '22 21:10 sgued

Thank you very much @sgued for this detailed explanation,

I just pushed a commit to catch the possible panics of the checksum and encryption functions. Unfortunately, there is no way to tell LMDB that the checksum function failed. Should I abort the program instead of doing nothing?

About the example that is wrong, as I am using a non-cryptographic valid checksum method, what do you recommend? Is there anything that can be done without implementing an AEAD interface that checksums by itself?

I will try to ping @hyc here, and if he doesn't respond I will probably contact him via Twitter. So Howard, if you read me, I am trying to expose a safe LMDB wrapper for Rust users. As I am not a cryptographer, I asked the community for help with how I have designed this wrapper so far. @sgued answered just above. Could you take a look at this, if you have time, please? What do you think about exposing an API closer to an AEAD interface?

Nov 04 '22 16:11 Kerollmops

With respect, @sgued's review is incorrect. LMDB's API is fully compliant with AEAD APIs and the code already provides an example implementation thereof.

https://github.com/LMDB/lmdb/blob/mdb.master3/libraries/liblmdb/crypto.c https://github.com/LMDB/lmdb/blob/mdb.master3/libraries/liblmdb/mtest_enc2.c

The LMDB checksum API is only for use when you only want a checksum and no encryption.

Nov 04 '22 17:11 hyc

Oh, sorry, I was mislead by the documentation:

The result must be the same number of bytes as the input.
@param[in] src The input data to be transformed.
@param[out] dst Storage for the result.
@param[in] key An array of three values: key[0] is the encryption key,
key[1] is the initialization vector, and key[2] is the authentication
data, if any.

That doesn't leave any place for outputting the authentication tag, and given that MDB_sum_func also takes a key as an argument I jumped to conclusions.

Looking at your example it appears that key[2] serves as output and input, which is not clear from the docs. It makes more sense to me now. The API looked a bit weird I should have asked for clarifications rather than changing it 😅 sorry.

Nov 04 '22 18:11 sgued

Thank you very much Howard, for your intervention. It is much appreciated 🔥

Thank you @sgued, for helping me. Could you help me understand and design a good interface on top of LMDB? Can I ask the user to give me an AeadMutInPlace object? It looks complex because this trait asks for a &mut self, and LMDB doesn't seem to let the user pass any user data.

According to Howard's analysis, I conclude that I should remove the comment I added about preferring encryption with checksum and interpret the checksum error as a wrong password key. Instead, we should be able to use a valid AEAD algorithm that also checksums cryptographically correctly and returns an appropriate error. Am I right?

Nov 05 '22 11:11 Kerollmops

By the way, I would encourage you to copy the interface used in liblmdb/crypto.c and encapsulate it as a dynamic object, the way that sample does. That way the LMDB commandline tools can load your crypto mechanism and operate on your encrypted databases still.

Nov 05 '22 12:11 hyc

Thank you @sgued, for helping me. Could you help me understand and design a good interface on top of LMDB? Can I ask the user to give me an AeadMutInPlace object? It looks complex because this trait asks for a &mut self, and LMDB doesn't seem to let the user pass any user data.

You don't need to pass user data. The KeyInnit trait gives you a new method that allows you to create it directly from the key. Something like that:

use aead::{AeadMutInPlace, Key, KeyInit, Nonce, Tag};

fn encrypt<A: AeadMutInPlace + KeyInit>(
    key: &[u8],
    nonce: &[u8],
    aad: &[u8],
    plaintext: &[u8],
    chipertext_out: &mut [u8],
    auth_out: &mut [u8],
) {
    chipertext_out.copy_from_slice(plaintext);
    let key: &Key<A> = key.try_into().unwrap();
    let nonce: &Nonce<A> = nonce.try_into().unwrap();
    let mut aead = A::new(key);
    let tag = aead
        .encrypt_in_place_detached(nonce, aad, chipertext_out)
        .unwrap();
    auth_out.copy_from_slice(&tag);
}

fn decrypt<A: AeadMutInPlace + KeyInit>(
    key: &[u8],
    nonce: &[u8],
    aad: &[u8],
    chipher_text: &[u8],
    output: &mut [u8],
    auth_in: &[u8],
) -> bool {
    output.copy_from_slice(chipher_text);
    let key: &Key<A> = key.try_into().unwrap();
    let nonce: &Nonce<A> = nonce.try_into().unwrap();
    let tag: &Tag<A> = auth_in.try_into().unwrap();
    let mut aead = A::new(key);
    aead.decrypt_in_place_detached(nonce, aad, output, tag)
        .is_ok()
}

fn with_chacha() {
    use aead::{generic_array::typenum::Unsigned, AeadCore, KeySizeUser};
    use chacha20poly1305::ChaCha20Poly1305;
    let mut pl = [1; 1024];
    let aad = [];
    let mut ciphertext = [0; 1024];
    // AeadCore and KeySizeUser are required for all T: AeadMutInPlace + KeyInnit, so you can use them to get the size of the key/tag/nonce.
    let nonce = [0; <ChaCha20Poly1305 as AeadCore>::NonceSize::USIZE];
    let mut auth = [0; <ChaCha20Poly1305 as AeadCore>::TagSize::USIZE];
    let key = [2; <ChaCha20Poly1305 as KeySizeUser>::KeySize::USIZE];
    encrypt::<ChaCha20Poly1305>(&key, &nonce, &aad, &pl, &mut ciphertext, &mut auth);
    pl = [0; 1024];
    assert!(decrypt::<ChaCha20Poly1305>(&key, &nonce, &aad, &ciphertext, &mut pl, &auth));
}

Nov 06 '22 11:11 sgued

I tried working with the AeadMutInPlace and KeyInit traits with LMDB, but I got a runtime error with the size of the initialization vector LMDB returns. It gives a 16 bytes array when ChaCha20 asks for a 12 bytes length array as a nonce. I am probably doing something wrong here and should derive a nonce from the IV.

Could someone help me with that subject?

You can reproduce the issue by cargo run --example encrypt, and if you want a better backtrace, you can RUST_BACKTRACE=1.

Nov 12 '22 13:11 Kerollmops

Ok, so I was able to encrypt/decrypt an LMDB database with ChaCha20Poly1305. The thing is that I cropped the 16 bytes initialization vector coming from LMDB to be 12 bytes and be used by this algorithm. I don't know if it is cryptographically correct. What should I do if an algorithm needs a bigger nonce (iv) like XChaCha20Poly1305 that requires a 20 bytes nonce?

@hyc and @sgued, Can I ask you a technical question about how I should design an app on top of LMDB? I am designing a small app that stores files in an encrypted LMDB; this app will likely be encrypted with ChaCha20Poly1305, so the key will be 20 bytes long. As I can't ask a user to use a 20-byte long password, I thought about hashing the user password to output a 20 bytes long key and use that for the encryption. I also read about the difference between an encryption key and a password (on quora 😬), but they advise creating a very long key and then using the user password to encrypt it instead. Unfortunately, the key must be 20-byte long again and I am not sure we can do that in LMDB.

What is your advice about that?

Nov 12 '22 15:11 Kerollmops

I tried working with the AeadMutInPlace and KeyInit traits with LMDB, but I got a runtime error with the size of the initialization vector LMDB returns. It gives a 16 bytes array when ChaCha20 asks for a 12 bytes length array as a nonce. I am probably doing something wrong here and should derive a nonce from the IV.

Could someone help me with that subject?

Again, just study the example in liblmdb/crypto.c. Just truncate.

Nov 12 '22 17:11 hyc

What should I do if an algorithm needs a bigger nonce (iv) like XChaCha20Poly1305 that requires a 20 bytes nonce?

XChaCha20 is used to protect against nonce misuse in systems that use randomly generated nonces (i.e., to protect against weak RNGs). There is no need to use that in LMDB since LMDB nonces aren't random and are guaranteed to be unique.

As I can't ask a user to use a 20-byte long password, I thought about hashing the user password to output a 20 bytes long key and use that for the encryption.

What you're asking about here is known as a Key Derivation Function, KDF. https://en.wikipedia.org/wiki/Key_derivation_function Popular choices are pbkdf or argon2. Again, for LMDB, you should expose whatever you use in your module as the string2key function.

Nov 12 '22 17:11 hyc

Call for Testing

Hey Rustaceans, I did a lot of work to provide the "recently" released support for page-level encryption of LMDB in heed. It is exposed as an encrypt_with function on the EnvOpenOptions builder. You can find a very basic example in the examples folder. I am asking you to help me test this feature to improve it, document it better, or fix it in the worst case:

Try it with a set of entries and custom LMDB flags/options, but please report LMDB bugs on the dedicated website.
Try it with different AEAD crates.
Be creative and use it in different projects or contexts.

I would be glad if you help me improve this crate. Note that this PR is part of a massive rewrite of the crate to use the latest version of LMDB. You'll notice many undocumented breaking changes compared to the other versions of heed.

Dec 16 '22 17:12 Kerollmops

I must cherry-pick these 18 commits on top of #128 instead of rebasing...

Jan 11 '23 10:01 Kerollmops

This branch has just been rebased on the latest main commit 3a20c96.

Nov 26 '23 14:11 Kerollmops

There is an open report claiming that AEAD is not working. I've not seen a reproducer for it. https://bugs.openldap.org/show_bug.cgi?id=9920 If your testing turns up any problems in LMDB please followup there.

Nov 26 '23 19:11 hyc

heed heed copied to clipboard

Expose the LMDB Encrypt/Decrypt feature

Call for Advice

Where Should I Start?

What should I do next?

Unsoundness

AEAD

The example

Limits of LMDB

Call for Testing

heed
heed copied to clipboard