pykeepass icon indicating copy to clipboard operation
pykeepass copied to clipboard

[Security vulnerability] Seeds are not changed on database save

Open basak opened this issue 4 years ago • 15 comments

Hi,

Thank you for pykeepass!

I noticed what I believe is a security issue with the way that pykeepass saves a database (I've tested only KDBX). None of the four seeds are regenerated. These are:

  • kdbx.header.value.dynamic_header.master_seed.data
  • kdbx.header.value.dynamic_header.encryption_iv.data
  • kdbx.header.value.dynamic_header.transform_seed.data
  • kdbx.header.value.dynamic_header.protected_stream_key.data

I believe this results in a number of issues. Note that I'm not a cryptographic expert. However, the official keepass client does in my testing regenerate these parameters on every save, so I think that should be a good enough reason for pykeepass not to vary in behaviour from that.

Some issues I believe exist with not regenerating the seeds:

  • Everyone who uses create_database() will get the same seed for password derivation, which makes them vulnerable to precomputation attacks.
  • I'm looking to use pykeepass to derive a "subset" password database for lower security devices. In this case, although I might generate the derived database from a master database, I would not expect the derived database to leak any information about the master database I specifically chose not to include. However, without regeneration of the seeds, an attacker with access to the derived database would be able to precompute tables that make the master password faster to crack if it were later exposed.
  • Not regenerating protected_stream_key is not as severe, but it would still allow an attacker who manages to access only the decrypted XML to be able to compare passwords from different users for equality, when regenerating this header field would have prevented it.

Note that this affects both create_database() and PyKeePass.save(). In both cases, all seeds should be regenerated, just like the "official" keepass client does it.

Here are test cases that should pass when this issues is fixed:

import pykeepass
import pytest


EXPECTED_SALT_ATTRS = [
    "kdbx.header.value.dynamic_header.master_seed.data",
    "kdbx.header.value.dynamic_header.encryption_iv.data",
    "kdbx.header.value.dynamic_header.transform_seed.data",
    "kdbx.header.value.dynamic_header.protected_stream_key.data",
]


def get_recursive_attr(obj, name):
    try:
        idx = name.index(".")
    except ValueError:
        return getattr(obj, name)
    else:
        return get_recursive_attr(getattr(obj, name[:idx]), name[idx + 1 :])


@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_creation_salts_differ(tmpdir, attr):
    a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
    b = pykeepass.create_database(str(tmpdir.join("b.kdbx")))

    assert get_recursive_attr(a, attr) != get_recursive_attr(b, attr)


@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_saved_salts_differ(tmpdir, attr):
    a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
    old_salt = get_recursive_attr(a, attr)
    a.save()
    b = pykeepass.PyKeePass("a.kdbx")
    new_salt = get_recursive_attr(b, attr)
    assert old_salt != new_salt

basak avatar Dec 15 '20 04:12 basak

Thanks for reporting this.

After looking through the keepassxc source, it seems like master_seed, encryption_iv, and protected_stream_key are indeed regenerated on save.

kdbx4 kdbx3

Evidlo avatar Dec 15 '20 05:12 Evidlo

Yes, and additionally transform_seed is also regenerated, just a little more indirectly:

  • https://github.com/keepassxreboot/keepassxc/blob/3b459813ed19f9f7a4e79b237134a749151f13b2/src/format/Kdbx4Writer.cpp#L55 sets updateTransformSalt to true
  • https://github.com/keepassxreboot/keepassxc/blob/3b459813ed19f9f7a4e79b237134a749151f13b2/src/core/Database.cpp#L730 then regenerates it

I used keepass' behaviour to verify, as opposed to keepassxc, but the result is the same.

basak avatar Dec 16 '20 19:12 basak

I implemented a workaround for now by poking into kdbx.header... and kdbx.body... and resetting the necessary values with secrets.token_bytes(len(previous_value)). There were a few gotchas which will also probably need to be tackled for a proper fix. Being as it is in a pykeepass-using app rather than in pykeepass itself, my workaround is too far diverged to use against pykeepass, but hopefully the following will help:

  1. Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the kdbx structure to change differ depending on major_version; and for version 4, they differ again depending on kdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"].
  2. After modifying kdbx.header.value, I needed to del kdbx.header.data because a RawCopy will rebuild from its data attribute if that still exists. I filed construct/construct#890 to improve the documentation on this.
  3. Due to construct/construct#888, calling PyKeePass.save() then crashes. I filed construct/construct#889 to fix this; in the meantime, I worked around by creating the stream myself (with w+b) and calling build_stream() instead. You'll either need construct/construct#888 fixed upstream before you can fix this issue, or you'll need a similar workaround in PyKeePass.save().

Phew!

basak avatar Dec 21 '20 00:12 basak

Thanks for the good work.

Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the kdbx structure to change differ depending on major_version; and for version 4, they differ again depending on kdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"].

Yes, the location of the KDF parameters within the database bytestream changes depending on the database version, which is reflected in the object that kdbx_parsing returns. Ideally the kdbx_parsing would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct and just doing the parsing with the struct module (see here, although this is very old code and can't build databases). This would give us full control over where the parsed values are stored in the database object and might make things like converting between database versions easier.

After modifying kdbx.header.value, I needed to del kdbx.header.data because a RawCopy will rebuild from its data attribute if that still exists. I filed construct/construct#890 to improve the documentation on this.

You actually figured out a problem I was having. I wanted to add functionality for changing the encryption and key derivation methods, but my changed values kept being overwritten by the previously copied bytes.

Evidlo avatar Dec 22 '20 00:12 Evidlo

Ideally the kdbx_parsing would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct...

FWIW, I didn't know about construct before I started looking at pykeepass a couple of weeks ago, and I'm very impressed both by construct and how you have used it. It's really nice how pykeepass deals with the different format versions "all at once" in a single structure. I'm looking forward to using construct to solve a similar problem I'm busy solving in a different project.

The way construct works, I think it's a necessary requirement that the data structure that comes out of it looks like the file format itself, rather than being able to abstract across fields that appear differently in the file format in different file format versions.

To abstract over this, I'd do it outside of construct, but still use construct as you have done. For example, you could create a "proxy" type Python object by capturing getattr() and setattr() that maps to different properties inside the current kdbx structure depending on major_version. This could be stateless except that it would be instantiated with a kdbx reference. It could then present a more unified view into kdbx, which would then be supplied by kdbx_parsing for the main PyKeePass class to use.

basak avatar Dec 23 '20 03:12 basak

Disclosure: I am the maintainer of Construct.

Thanks for the high praise Basak. I was impressed by Construct too and that is why I became its maintainer. Evidlo, I encourage you to post more Issues at Construct's forum. Its clear you had some issues with Construct but maybe we can resolve them. I dont think using struct module is a good idea in a sophisticated project like this.

I can confirm issue # 888 will be resolved upstream before end of the year.

Another disclosure: I had a Cryptography class at Stanford. Which I have to admit was probably the best course at Coursera I ever had and I have like 20 certificates from there so... yeah, back to the topic:

Unfortunately I dont know pykeepass enough to advise anything but I can say one thing: using same seeds opens a way for using rainbow tables which I think was what basak meant by precomputation attacks. I believe Basak is right on this.

arekbulski avatar Dec 23 '20 16:12 arekbulski

A couple more loose ends for version 3:

  1. There's a minor bug with a fix in #222, without which the official KeePass client won't accept version 3 files.
  2. HeaderHash in the XML needs recalculating, without which the official KeePass client rejects a changed header as corrupt. Deleting the field also worked, though recalculating seems better. I think you can probably do this in UnprotectedStream as the recomputed header data field should be available at that stage, though I haven't tested this. I guess that would want to be conditional on major_version. As a workaround, I modified the tree attribute after recomputing the header an additional time by hand, and that worked.

basak avatar Dec 27 '20 06:12 basak

One more. It's probably also wise to regenerate kdbx.header.value.dynamic_header.stream_start_bytes.data in the case of version 3, to stop an attacker building up sets of known plaintexts. I don't know if there's a cryptographic attack possible otherwise, but again I believe the official client does this, so I think it makes sense to avoid verifying that it's safe not to do it, and just do it.

To summarise, here are the fields to regenerate for version 3:

    "kdbx.header.value.dynamic_header.master_seed.data",
    "kdbx.header.value.dynamic_header.encryption_iv.data",
    "kdbx.header.value.dynamic_header.protected_stream_key.data",
    "kdbx.header.value.dynamic_header.stream_start_bytes.data",

For version 4:

    "kdbx.header.value.dynamic_header.master_seed.data",
    "kdbx.header.value.dynamic_header.encryption_iv.data",
    "kdbx.body.payload.inner_header.protected_stream_key.data",

And additionally kdbx.header.value.dynamic_header.transform_seed.data for version 3, and KDF-specific for version 4 (eg. kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value if argon2 is used).

Then HeaderHash (if version 3) and the individual protected XML password elements need recalculating.

I've not looked into version < 3 (doesn't affect me).

basak avatar Dec 27 '20 07:12 basak

Here's my solution to generate new seeds in a simplified pseudo-KeePass database.

from construct import Struct, RawCopy, Bytes, Checksum, this, Subconstruct
from construct import stream_seek, stream_tell, stream_read, stream_write
from Cryptodome.Random import get_random_bytes
from io import BytesIO

# simple placeholder checksum function
def check_func(header_data, master_seed):
    return header_data + master_seed


class RandomBytes(Bytes):
    """Same as Bytes, but generate random bytes when building"""

    def _build(self, obj, stream, context, path):
        length = self.length(context) if callable(self.length) else self.length
        data = get_random_bytes(length)
        stream_write(stream, data, length, path)
        return data

class Copy(Subconstruct):
    """Same as RawCopy, but don't create parent container when parsing.
    Instead store data in ._data attribute of subconstruct, and never rebuild from data
    """

    def _parse(self, stream, context, path):
        offset1 = stream_tell(stream, path)
        obj = self.subcon._parsereport(stream, context, path)
        offset2 = stream_tell(stream, path)
        stream_seek(stream, offset1, 0, path)
        obj._data = stream_read(stream, offset2-offset1, path)
        return obj

    def _build(self, obj, stream, context, path):
        offset1 = stream_tell(stream, path)
        obj = self.subcon._build(obj, stream, context, path)
        offset2 = stream_tell(stream, path)
        stream_seek(stream, offset1, 0, path)
        obj._data = stream_read(stream, offset2-offset1, path)
        return obj

# simple database format with header and checksum
s = Struct(
    "header" / Copy(Struct(
        "master_seed" / RandomBytes(8),
        "more_data" / Bytes(8)
    )),
    "hash" / Checksum(
        Bytes(24),
        lambda ctx: check_func(ctx.header._data, ctx.header.master_seed),
        this
    )
)

# synthesize a 'database'
master_seed = b'00000000'
more_data = b'12345678'
header = master_seed + more_data
data = header + check_func(header, master_seed)

# parse the database
parsed = s.parse(data)

# rebuild and try to reparse final result
data2 = s.build(parsed)
s.parse(data2)
assert data != data2, "Database unchanged"

Evidlo avatar Feb 06 '21 23:02 Evidlo

I'd appreciate some feeback on the above.

Evidlo avatar Feb 17 '21 23:02 Evidlo

As far as Construct is concerned, I see no problems with your code.

arekbulski avatar Feb 17 '21 23:02 arekbulski

I really like this approach. Nice work! I wonder if RandomBytes and Copy are sufficiently general to be appropriate to include in Construct itself? If not they work equally well here.

One downside of this approach is that each build triggers a rekey. This is what we want to fix this issue, of course, but I wonder if there are times when we might want to build without rekeying. For example, for unit testing.

I don't have a strong feeling as to whether explicit or implicit rekeying would be better. I think it's difficult to say without attempting one approach or the other.

I think this mechanism should be sufficient to do most of the necessary work. HeaderHash (version 3) will need recalculating from my notes above; I'm not sure if this mechanism will do this, or it needs implementing separately?

basak avatar Feb 28 '21 11:02 basak

Those 2 classes surely are a creative work. But they wont get into Construct mainstream. Copy already does what RawCopy does, and in a way I dont like, too obscure. And RandomBytes is not general enough to be included. But if this works for you, I dont see why not.

arekbulski avatar Feb 28 '21 15:02 arekbulski

Is there any update to this? It seems like a rather problematic issue.

A6GibKm avatar Jun 23 '21 16:06 A6GibKm

Can someone suggest the necessary custom code to init the seeds (KDBX4, default KDF/encryption) before calling .save()?:

kp = create_database("test.kdbx", password=test, keyfile=None, transformed_key=None)
kp.kdbx.body.payload.xml = etree.fromstring(xml_str)
kp.kdbx.header.value.dynamic_header.master_seed.data = get_random_bytes(8)
kp.kdbx.header.value.dynamic_header.encryption_iv.data = ??
kp.kdbx.body.payload.inner_header.protected_stream_key.data = ??
#kp.kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value =   # if argon2 is used
kp.save()

Without this, the output is reproducible, which is a showstopper for encrypted data.

vszakats avatar Jan 07 '23 16:01 vszakats