pykeepass
pykeepass copied to clipboard
[Security vulnerability] Seeds are not changed on database save
Hi,
Thank you for pykeepass!
I noticed what I believe is a security issue with the way that pykeepass saves a database (I've tested only KDBX). None of the four seeds are regenerated. These are:
-
kdbx.header.value.dynamic_header.master_seed.data
-
kdbx.header.value.dynamic_header.encryption_iv.data
-
kdbx.header.value.dynamic_header.transform_seed.data
-
kdbx.header.value.dynamic_header.protected_stream_key.data
I believe this results in a number of issues. Note that I'm not a cryptographic expert. However, the official keepass client does in my testing regenerate these parameters on every save, so I think that should be a good enough reason for pykeepass not to vary in behaviour from that.
Some issues I believe exist with not regenerating the seeds:
- Everyone who uses
create_database()
will get the same seed for password derivation, which makes them vulnerable to precomputation attacks. - I'm looking to use pykeepass to derive a "subset" password database for lower security devices. In this case, although I might generate the derived database from a master database, I would not expect the derived database to leak any information about the master database I specifically chose not to include. However, without regeneration of the seeds, an attacker with access to the derived database would be able to precompute tables that make the master password faster to crack if it were later exposed.
- Not regenerating
protected_stream_key
is not as severe, but it would still allow an attacker who manages to access only the decrypted XML to be able to compare passwords from different users for equality, when regenerating this header field would have prevented it.
Note that this affects both create_database()
and PyKeePass.save()
. In both cases, all seeds should be regenerated, just like the "official" keepass client does it.
Here are test cases that should pass when this issues is fixed:
import pykeepass
import pytest
EXPECTED_SALT_ATTRS = [
"kdbx.header.value.dynamic_header.master_seed.data",
"kdbx.header.value.dynamic_header.encryption_iv.data",
"kdbx.header.value.dynamic_header.transform_seed.data",
"kdbx.header.value.dynamic_header.protected_stream_key.data",
]
def get_recursive_attr(obj, name):
try:
idx = name.index(".")
except ValueError:
return getattr(obj, name)
else:
return get_recursive_attr(getattr(obj, name[:idx]), name[idx + 1 :])
@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_creation_salts_differ(tmpdir, attr):
a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
b = pykeepass.create_database(str(tmpdir.join("b.kdbx")))
assert get_recursive_attr(a, attr) != get_recursive_attr(b, attr)
@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_saved_salts_differ(tmpdir, attr):
a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
old_salt = get_recursive_attr(a, attr)
a.save()
b = pykeepass.PyKeePass("a.kdbx")
new_salt = get_recursive_attr(b, attr)
assert old_salt != new_salt
Thanks for reporting this.
After looking through the keepassxc source, it seems like master_seed
, encryption_iv
, and protected_stream_key
are indeed regenerated on save.
Yes, and additionally transform_seed
is also regenerated, just a little more indirectly:
- https://github.com/keepassxreboot/keepassxc/blob/3b459813ed19f9f7a4e79b237134a749151f13b2/src/format/Kdbx4Writer.cpp#L55 sets
updateTransformSalt
to true - https://github.com/keepassxreboot/keepassxc/blob/3b459813ed19f9f7a4e79b237134a749151f13b2/src/core/Database.cpp#L730 then regenerates it
I used keepass' behaviour to verify, as opposed to keepassxc, but the result is the same.
I implemented a workaround for now by poking into kdbx.header...
and kdbx.body...
and resetting the necessary values with secrets.token_bytes(len(previous_value))
. There were a few gotchas which will also probably need to be tackled for a proper fix. Being as it is in a pykeepass-using app rather than in pykeepass itself, my workaround is too far diverged to use against pykeepass, but hopefully the following will help:
- Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the
kdbx
structure to change differ depending onmajor_version
; and for version 4, they differ again depending onkdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"]
. - After modifying
kdbx.header.value
, I needed todel kdbx.header.data
because aRawCopy
will rebuild from itsdata
attribute if that still exists. I filed construct/construct#890 to improve the documentation on this. - Due to construct/construct#888, calling
PyKeePass.save()
then crashes. I filed construct/construct#889 to fix this; in the meantime, I worked around by creating the stream myself (withw+b
) and callingbuild_stream()
instead. You'll either need construct/construct#888 fixed upstream before you can fix this issue, or you'll need a similar workaround inPyKeePass.save()
.
Phew!
Thanks for the good work.
Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the kdbx structure to change differ depending on major_version; and for version 4, they differ again depending on kdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"].
Yes, the location of the KDF parameters within the database bytestream changes depending on the database version, which is reflected in the object that kdbx_parsing
returns. Ideally the kdbx_parsing
would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct and just doing the parsing with the struct module (see here, although this is very old code and can't build databases). This would give us full control over where the parsed values are stored in the database object and might make things like converting between database versions easier.
After modifying kdbx.header.value, I needed to del kdbx.header.data because a RawCopy will rebuild from its data attribute if that still exists. I filed construct/construct#890 to improve the documentation on this.
You actually figured out a problem I was having. I wanted to add functionality for changing the encryption and key derivation methods, but my changed values kept being overwritten by the previously copied bytes.
Ideally the kdbx_parsing would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct...
FWIW, I didn't know about construct before I started looking at pykeepass a couple of weeks ago, and I'm very impressed both by construct and how you have used it. It's really nice how pykeepass deals with the different format versions "all at once" in a single structure. I'm looking forward to using construct to solve a similar problem I'm busy solving in a different project.
The way construct works, I think it's a necessary requirement that the data structure that comes out of it looks like the file format itself, rather than being able to abstract across fields that appear differently in the file format in different file format versions.
To abstract over this, I'd do it outside of construct, but still use construct as you have done. For example, you could create a "proxy" type Python object by capturing getattr()
and setattr()
that maps to different properties inside the current kdbx
structure depending on major_version
. This could be stateless except that it would be instantiated with a kdbx
reference. It could then present a more unified view into kdbx
, which would then be supplied by kdbx_parsing
for the main PyKeePass
class to use.
Disclosure: I am the maintainer of Construct.
Thanks for the high praise Basak. I was impressed by Construct too and that is why I became its maintainer. Evidlo, I encourage you to post more Issues at Construct's forum. Its clear you had some issues with Construct but maybe we can resolve them. I dont think using struct module is a good idea in a sophisticated project like this.
I can confirm issue # 888 will be resolved upstream before end of the year.
Another disclosure: I had a Cryptography class at Stanford. Which I have to admit was probably the best course at Coursera I ever had and I have like 20 certificates from there so... yeah, back to the topic:
Unfortunately I dont know pykeepass enough to advise anything but I can say one thing: using same seeds opens a way for using rainbow tables which I think was what basak meant by precomputation attacks. I believe Basak is right on this.
A couple more loose ends for version 3:
- There's a minor bug with a fix in #222, without which the official KeePass client won't accept version 3 files.
-
HeaderHash
in the XML needs recalculating, without which the official KeePass client rejects a changed header as corrupt. Deleting the field also worked, though recalculating seems better. I think you can probably do this inUnprotectedStream
as the recomputed header data field should be available at that stage, though I haven't tested this. I guess that would want to be conditional onmajor_version
. As a workaround, I modified thetree
attribute after recomputing the header an additional time by hand, and that worked.
One more. It's probably also wise to regenerate kdbx.header.value.dynamic_header.stream_start_bytes.data
in the case of version 3, to stop an attacker building up sets of known plaintexts. I don't know if there's a cryptographic attack possible otherwise, but again I believe the official client does this, so I think it makes sense to avoid verifying that it's safe not to do it, and just do it.
To summarise, here are the fields to regenerate for version 3:
"kdbx.header.value.dynamic_header.master_seed.data",
"kdbx.header.value.dynamic_header.encryption_iv.data",
"kdbx.header.value.dynamic_header.protected_stream_key.data",
"kdbx.header.value.dynamic_header.stream_start_bytes.data",
For version 4:
"kdbx.header.value.dynamic_header.master_seed.data",
"kdbx.header.value.dynamic_header.encryption_iv.data",
"kdbx.body.payload.inner_header.protected_stream_key.data",
And additionally kdbx.header.value.dynamic_header.transform_seed.data
for version 3, and KDF-specific for version 4 (eg. kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value
if argon2 is used).
Then HeaderHash
(if version 3) and the individual protected XML password elements need recalculating.
I've not looked into version < 3 (doesn't affect me).
Here's my solution to generate new seeds in a simplified pseudo-KeePass database.
from construct import Struct, RawCopy, Bytes, Checksum, this, Subconstruct
from construct import stream_seek, stream_tell, stream_read, stream_write
from Cryptodome.Random import get_random_bytes
from io import BytesIO
# simple placeholder checksum function
def check_func(header_data, master_seed):
return header_data + master_seed
class RandomBytes(Bytes):
"""Same as Bytes, but generate random bytes when building"""
def _build(self, obj, stream, context, path):
length = self.length(context) if callable(self.length) else self.length
data = get_random_bytes(length)
stream_write(stream, data, length, path)
return data
class Copy(Subconstruct):
"""Same as RawCopy, but don't create parent container when parsing.
Instead store data in ._data attribute of subconstruct, and never rebuild from data
"""
def _parse(self, stream, context, path):
offset1 = stream_tell(stream, path)
obj = self.subcon._parsereport(stream, context, path)
offset2 = stream_tell(stream, path)
stream_seek(stream, offset1, 0, path)
obj._data = stream_read(stream, offset2-offset1, path)
return obj
def _build(self, obj, stream, context, path):
offset1 = stream_tell(stream, path)
obj = self.subcon._build(obj, stream, context, path)
offset2 = stream_tell(stream, path)
stream_seek(stream, offset1, 0, path)
obj._data = stream_read(stream, offset2-offset1, path)
return obj
# simple database format with header and checksum
s = Struct(
"header" / Copy(Struct(
"master_seed" / RandomBytes(8),
"more_data" / Bytes(8)
)),
"hash" / Checksum(
Bytes(24),
lambda ctx: check_func(ctx.header._data, ctx.header.master_seed),
this
)
)
# synthesize a 'database'
master_seed = b'00000000'
more_data = b'12345678'
header = master_seed + more_data
data = header + check_func(header, master_seed)
# parse the database
parsed = s.parse(data)
# rebuild and try to reparse final result
data2 = s.build(parsed)
s.parse(data2)
assert data != data2, "Database unchanged"
I'd appreciate some feeback on the above.
As far as Construct is concerned, I see no problems with your code.
I really like this approach. Nice work! I wonder if RandomBytes
and Copy
are sufficiently general to be appropriate to include in Construct itself? If not they work equally well here.
One downside of this approach is that each build triggers a rekey. This is what we want to fix this issue, of course, but I wonder if there are times when we might want to build without rekeying. For example, for unit testing.
I don't have a strong feeling as to whether explicit or implicit rekeying would be better. I think it's difficult to say without attempting one approach or the other.
I think this mechanism should be sufficient to do most of the necessary work. HeaderHash
(version 3) will need recalculating from my notes above; I'm not sure if this mechanism will do this, or it needs implementing separately?
Those 2 classes surely are a creative work. But they wont get into Construct mainstream. Copy already does what RawCopy does, and in a way I dont like, too obscure. And RandomBytes is not general enough to be included. But if this works for you, I dont see why not.
Is there any update to this? It seems like a rather problematic issue.
Can someone suggest the necessary custom code to init the seeds (KDBX4, default KDF/encryption) before calling .save()
?:
kp = create_database("test.kdbx", password=test, keyfile=None, transformed_key=None)
kp.kdbx.body.payload.xml = etree.fromstring(xml_str)
kp.kdbx.header.value.dynamic_header.master_seed.data = get_random_bytes(8)
kp.kdbx.header.value.dynamic_header.encryption_iv.data = ??
kp.kdbx.body.payload.inner_header.protected_stream_key.data = ??
#kp.kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value = # if argon2 is used
kp.save()
Without this, the output is reproducible, which is a showstopper for encrypted data.