app
app copied to clipboard
Add support for multiple pw hashes, use secret key and user-binding
Hi,
I'm opening this as a draft PR, as there are aspects that make it not ready IMO (like ~~pyca/pynacl#676 not being merged yet,~~ not having a lossless DB migration script, etc.) but I feel this might require extensive discussion prior to merging.
Rationale
This PR implements two main changes:
- an abstraction layer over keyed and unkeyed password oracles, and the capability to define arbitrary oracles for storing user passwords;
- an
XChacha20
wrapper that turns any unkeyed oracle into a keyed one, using AEAD.
Those are submitted as a single PR as they depend on one-another: XChacha20
needs the KeyedOracle
abstraction to get its private key, and the PasswordKind
enum to construct the AAD value, while PasswordKind
needs at least one KeyedOracle
implementation to be populated (otherwise, password login is not possible). Allowing UnkeyedOracle
s in PasswordKind
would break this dependency loop, and was considered, but ultimately rejected as strictly-worse design: unkeyed constructions are less secure, so only keyed constructions should be used anyhow, and the long-term benefit outweight the short-term cost of a larger PR to review as a single block.
This also motivates the KeyedOracle
vs. UnkeyedOracle
distinction, since this makes the cryptographic distinction readily apparent in the code. KeyedOracle
uses the Blake2b keyed hash function as a KDF, to automatically derive a unique private key for each derived class; the use of a KDF is to prevent using the same (or related) keys in different constructions, which can be insecure, and Blake2b was selected as it is fast, secure, supports personalisation, and readily available in the standard library since Python 3.6.
The design of the XChacha20
wrapper was motivated by multiple consideration:
- encrypting users' password hashes under a site-specific key, protects user passwords from offline attacks by an adversary who got access to the contents of the databases (through SQL injection, compromising the database server itself, or gaining access to a backup copy of the database) ;
- the
XChacha20-Poly1305
construction is fast regardless of whether specialized hardware support is available, suffers from no restrictions on the number of messages that can be encrypted under a given key or their size (unlike other AEAD constructions such asAES-GCM
), and a high-quality implementation is available as part oflibsodium
(I am exposing it inPyNaCl
, the binding maintained by the Python Cryptographic Authority) ; - adding
PyNaCl
as a dependency makes other password hash functions available (such as Argon2 and scrypt) that are more modern than bcrypt ; - using AEAD instead of plain AE (such as
libsodium
'sSecretBox
AE construction) allows binding ciphertexts to a givenpw_kind
andid
, to protect user accounts from an adversary who gained write access to theusers
table of the database: this prevents attacks such as replacing thepw_kind
field of a victim user's record (substituting a weak oracle for a strong one, under the AEAD wrapper) or replacing thepw_blob
from that of a “donor” account whose password is known to the attacker.
Assumptions
- The
user.id
primary key is immutable. If theid
of a given user record is changed, decryption of theirpw_blob
will fail, effectively removing their ability to log in using a password. - The user set a
PW_SITE_KEY
environment variable, containing a uniformly-random, hex-encoded, 256b private key; if it isn't set, they are told to usepython3 -c 'import secrets; print(secrets.token_hex(32))'
to correctly generate one.
Changes
- [x] Add the
PW_SITE_KEY
configuration variable. - [x] Add the
KeyedOracle
andUnkeyedOracle
abstract base classes, and related tests. - [x] Expose
bcrypt
as anUnkeyedOracle
- [x] Implement
XChacha20
, using encryption to convert anyUnkeyedOracle
into aKeyedOracle
. - [ ] Implement a migration that encrypts/decrypts existing password hashes, so the change is transparent to users and can be rolled back.
- [ ] Implement a mechanism to rollover the site's secret key.
- [ ] Automatically upgrade user records to the (current) preferred password oracle.
For instance, a user who last set their password when
DEFAULT_KIND
wasAEAD_XCHACHA20_BCRYPT
, should have it upgraded to another oracle construction when a newer one is implemented.
Discussion needed
- ~~Should the
PW_SITE_KEY
be renamed to something more generic, and potentially be used in a similar way as here to encrypt other secrets?~~ - ~~Should there be support for multiple site keys, so key rollovers can be performed without downtime, and in the presence of non-upgradable
KeyedOracle
s (such as Argon2, using its support for keying and AAD) ?~~
Rebased on a current master
The PR looks good though I must admit it's a bit advanced given my knowledge about password oracles :).
About the questions
Should the PW_SITE_KEY be renamed to something more generic, and potentially be used in a similar way as here to encrypt other secrets?
I think the naming is good. Do you know if it requires a database migration or any value can be chosen? If data migration is needed, can we document it in https://github.com/simple-login/app/blob/95c8f14ea50240c1b59d008ea0dc911dd94da3cd/docs/upgrade.md#L1-L0 so people who self-host can run the migration too?
Should there be support for multiple site keys, so key rollovers can be performed without downtime, and in the presence of non-upgradable KeyedOracles (such as Argon2, using its support for keying and AAD) ?
Yes without downtime is quite important in my opinion.
The PR looks good though I must admit it's a bit advanced given my knowledge about password oracles :).
Thanks; I would have sent it in smaller, easier-to-review chunks, but I couldn't find a clean way to split it up.
Do you know if it requires a database migration or any value can be chosen? If data migration is needed, can we document it in https://github.com/simple-login/app/blob/95c8f14ea50240c1b59d008ea0dc911dd94da3cd/docs/upgrade.md#L1-L0 so people who self-host can run the migration too?
Yes, the PR will require a data migration, which I haven't implemented yet. (I'm no Alembic expert, I'm afraid)
Should there be support for multiple site keys, so key rollovers can be performed without downtime, and in the presence of non-upgradable KeyedOracles (such as Argon2, using its support for keying and AAD) ?
Yes without downtime is quite important in my opinion.
OK; I will come back to you on that later, as there are 2 main strategies I'm aware of for managing key rollover in password storage. and I'd prefer to discuss them with a few cryptographers before committing to either approach, to check that my analysis is correct:
- A “password onion” à la Facebook, where a key rollover or algorithm change is achieved by wrapping the existing hash in an additional layer. This seems to me like a brittle construction, and would require special handling for the encryption layers. It's also a lot of additional complexity, and requires holding on to all site keys “forever”.
- Only use password oracles that support key “upgrades”; for instance, w/
AEAD_XCHACHA20_BCRYPT
, one can simply decrypt the existing data (with the “old” key) and re-encrypt it (under the “new” key) This seems much simpler to me, both to implement and to operate, as at most 2 keys need to be handled at any given time. Moreover, the security argument is straightforward, and older keys can be discarded (which hedges against scenario where password data protected under an older key was leaked, but they key itself hasn't yet)
In both cases, the rollover process is a bit involved, if it has to be done without downtime, as one needs to ensure that all running instances can decrypt data produced by any other instance:
- replace the running instances of SimpleLogin with ones that have the new key in the configuration, but do not produce data encrypted under it (they can only decrypt such data);
- replace the running instances (again!) with ones configured to produce data encrypted under the new key;
- decrypt/re-encrypt all password data under the new key, as a batch process.
@nguyenkims Could you help me out with the ORM? Under PostgreSQL, it rejects both db.BINARY
and db.BLOB
for the pw_blob
column, even though PostgreSQL has a binary type.
Had to update the dependency on PyNaCl (my branch for PyCA/PyNaCl#676 doesn't exist anymore, and was merged upstream) and rebase to ensure that the deps can be resolved and installed at any point in the git history
@nguyenkims Could you help me out with the ORM? Under PostgreSQL, it rejects both
db.BINARY
anddb.BLOB
for thepw_blob
column, even though PostgreSQL has a binary type.
@nbraud I tried to run sh scripts/new-migration.sh
and it seems to work fine and LargeBinary is indeed the right type to use according to https://docs.sqlalchemy.org/en/14/core/type_basics.html#sqlalchemy.types.LargeBinary
What error do you have when running the code?