bc-java icon indicating copy to clipboard operation
bc-java copied to clipboard

Test data for Evidence Records implementation (RFC 4998)

Open raubv0gel opened this issue 2 years ago • 28 comments

I have recently finished my own implementation of the algorithms defined in RFC 4998, but used the ASN.1 data structures in bc. Some ER test data my implementation outputs are er-test.zip. It contains 5 binary data objects with associated evidence records. Each ER contains an initial archive time stamp (SHA-256) and a restamped archive time stamp (SHA-256) and a rehashed archive time stamp (SHA-512).

@dghgit, test data are especially for you. Is it possible that you provide some test data for me? I want to test my EvidenceRecordValidator.

raubv0gel avatar Aug 23 '22 13:08 raubv0gel

Thanks. I can read and validate all those. Here's some I generated - in this case I did the SHA-256 initial stamp and then did a rehash with SHA-512. The input data is the same as yours were, but I have also included an additional record with the input data a single group (the group-data-0.ers file. Let me know how they go. ers-test-data.zip

PS. also decided while I looked the sorting of the leaves, return the EV in semi random order kind of sucked... it'll now return the records in order given.

dghgit avatar Aug 24 '22 07:08 dghgit

Thanks! I will try your test data …

Just a few moments ago I put the output of testHashRenewal() (ERSTest.java) into my ER validator. The validator says „the hash value to check is not contained in the archive time stamp, because the reduced hash tree is null“. Actually I see grafik If I’m not wrong, the first archive time stamp of an archive time stamp chain cannot have a null reduced hash tree (only the following archive time stamps can have).

raubv0gel avatar Aug 24 '22 07:08 raubv0gel

Might be time to get the RFC out again... So I think there - doesn't it say somewhere that if there's a single document just the time stamp is enough? If it's a single node in the reduced hash tree, the time stamp would just end up being a hash of a hash (at best).

I'll have a look at the document again. I've also pushed the latest code to here and there is a new beta up as well on https://www.bouncycastle.org/betas.

dghgit avatar Aug 24 '22 08:08 dghgit

Section 4.2 says

Note there are no restrictions on the quantity or length of hash- value lists. Also note that it is profitable but not required to build hash trees and reduce them. An Archive Timestamp may consist only of one list of hash-values and a timestamp or only a timestamp with no hash value lists.

Do you mean this part? Does this imply that we can have a null reduced hash tree? And does this mean that root hash value == data object hash value?

raubv0gel avatar Aug 24 '22 08:08 raubv0gel

Yes, as far as I can tell that's correct. I think the idea was also to allow "traditional" timestamps to be migrated into Archive ones and provide a mechanism to ensure that it worked.

dghgit avatar Aug 24 '22 09:08 dghgit

Thank you. Okay, I will adjust my implementation …

According to the ers-test-data.zip: I „randomly“ picked data-2 and data-2.ers. My validator says „timestamped hash value does not match calculated root hash value“. Actually I see grafik Here, the 1st partial hash tree contains only a single hash value. This should not be possible in case of 2 or more time stamped data objects. The 1st partial hash tree must contain at least 2 hash values. See Generation at Section 4.2:

Select all hash values, which have the same father node as h. Generate the first list of hash values by arranging these hashes, in binary ascending order. This will be stored in the structure of the PartialHashtree.

Furthermore, one can see in the examples, that the 1st partial hash tree contains at least 2 hash values.

In my opinion, this has been poorly designed by the RFC (it’s unnecessary complexity). Actually my ER generation implementation once acted like yours! And I noticed this „error“ after writing the ER validator.

raubv0gel avatar Aug 24 '22 10:08 raubv0gel

Hmmm... Um so there's:

"Section 4.3 - 2. Search for hash value h in the first list (partialHashtree) of reducedHashtree. If not present, terminate verification process with negative result."

For an interpretation where there could be more than 1 hash in a partial hash tree like this, it would mean that 4.3 - 2 would actually verify data that was not supposed to be in the evidence record. I'm not exactly sure that is a good idea. I agree the diagram doesn't help clear this up though...

In the dump above though, 084fed08b978af4d7d196a7446a86b58009e636b611db16211b65a9aadff29c5 is the value of data-2, so that value should be in there at least. 2b4c342f5433ebe591a1da77e013d1b72475562d48578dca8b84bac6651c3cb9 is the value of data-3. Evaluation, at least with I'm doing at the moment, means that the next partial hash tree is the hash of (data-0, data-1), so hash(hash(data-2, data-3), hash(data-0, data-1) is the parent node of the 4 of them, dbc1b4c900ffe48d575b5da5c638040125f65db0fe3e24494b76ea986457d986 is the hash of data-4. The root of all is then what goes in the timestamp. The first partial hash tree to the last one is basically just a path up through the tree from the leaf node of interest to the root, with the node values at each level being the ones you cannot calculate.

Now, assuming the first partial hash tree is actually meant to represent a branch, this one would start with a node containing (data-0, data-1) and there would be 2 other partial hash tree nodes, rather than 3. But it would also mean the same evidence record would match data-0 or data-1 according to 4.3-2. Worse if the case in the Figure 2 isn't a typo, then 4.3-2 would match h1 and a document hashing to h2abc (I'll admit difficult, but on the other hand not impossible and it doesn't seem sensible to encourage such a thing).

The funny thing is, the BC API will be able to validate either, but on creation it won't produce an ambiguous evidence record. I don't think I would claim hand on heart that the interpretation you are looking at is incorrect, but if it is I'm surprised no-one pointed out the obvious problem - the first partial hash tree will always look like a group of 2 even if the evidence record is for only one item and not group data. This is a change I really don't want to make - it's clearly not safe.

Actually, there may be an answer, have a look at: https://datatracker.ietf.org/doc/rfc6283/ section 3.2.2 section 2, the language appears to have been tightened up in there. That said I haven't found any errata 4998. In the case for the XML representation of this it is quite clear that it's either a hash by itself (when it's one document), or in the case of more than one hash it must be a group of documents.

dghgit avatar Aug 24 '22 12:08 dghgit

Thank you, I will look for it in detail …

Meanwhile, I want to note Section 4.2

Note there are no restrictions on the quantity or length of hash- value lists. Also note that it is profitable but not required to build hash trees and reduce them. An Archive Timestamp may consist only of one list of hash-values and a timestamp or only a timestamp with no hash value lists.

„An Archive Timestamp may consist only of one list of hash-values“, therefore it must be possible that the first partial hash tree contains multiple hash values. Right? Note, the first partial hash tree must contain only leaf hash values (= data object hash values) from the hash tree. Also note, the evidence record covers all hash values in the first partial hash tree. If we have for example 2 hash values h1 and h2 in the first partial hash tree, the ER is valid for both(!) hash values – up to the ordering we even get ER(h1) == ER(h2). Therefore there should be no security issue with validating h2 is contained in ER(h1) (and vice versa).

I got an ER from another application another-ers-test-data.zip. You may want to check out …

raubv0gel avatar Aug 24 '22 12:08 raubv0gel

Yes, it really is a bit of free for all isn't it. This is starting to feel more like an episode of Survivor than implementing a standard.

The only catch in the case you mention is what if I've elected to let h2 expire, or even revoked the h2 evidence record somehow? I'd be a bit stuck as there would still be a record out there that makes h2 valid. RFC 6283 appears to remove that possibility by saying it's either a group or a single item. Put another way you could say BC now implements RFC 4998 with the restriction imposed in RFC 6283. @veebee was quite adamant about the whole one data object. one evidence record thing as well, so assuming his recollection was correct the restriction in RFC 6283 is also correct. An errata wouldn't hurt though - it's clear that whoever he corresponded with believed they'd described something in RFC 4998 which really isn't described in RFC 4998.

So, with the new data, no problem verifying that one either. As long as everything else is done correctly we'll read this but the initial node will be regarded as a group, so you can validate for the individual items, or for the collection overall. Likewise, if another parser doesn't insist on only one item in the first partial hash tree, the BC timestamps will also work (whether it's a group, or a 2 single items the calculations are the same, you still walk up the tree one ASN.1 structure at a time, swapping the hash values for concatenation as you go, it's the group/non-group presentation that changes).

One other thing I did notice about RFC 6283 they include data leafs in the tree - so the immediate parent of a single data object is it's hash, just as the immediate parent of a data group is the collective hash of the group. This also diverges from what RFC 4998 says unless the way a parent node in RFC 4998 isn't quite as it sounds.

I'm going to try and write to the authors myself as well. We're probably starting to approach the first run of old timestamp certificates going sour, so I think this particular bit of technology is going to become more important. A bit of extra clarity in the standard would really help.

dghgit avatar Aug 25 '22 04:08 dghgit

Are you guys planning to report the inconsistencies and lack of clarity to the mailing list of the IETF WG that produced that RFC?

mouse07410 avatar Aug 25 '22 11:08 mouse07410

@mouse07410, If someone can tell me how, I would like to try.

raubv0gel avatar Aug 25 '22 11:08 raubv0gel

@dghgit

Yes, it really is a bit of free for all isn't it. This is starting to feel more like an episode of Survivor than implementing a standard.

Asking myself while trying to understand this RFC: At least, why not just provide pseudo code? Describing algorithms with natural language is so error prone.

@dghgit

… the first partial hash tree will always look like a group of 2 even if the evidence record is for only one item and not group data. This is a change I really don't want to make - it's clearly not safe.

Right, if we have more than 1 hash value in the first partial hash tree, we do not know whether it was a data group or not. Actually, I guess, it’s not important to know, because we only want to check if the data object hash value or one of the hash values from a grouped data object is contained. (I do not say, this is well designed – it is not. I mean, the whole concept of having grouped data objects looks ugly. There is no need to provide group data objects – one can group externally, e. g., just make a zip or tar if one has to group data objects … ERS would be much cleaner with (simple) data objects only.) From a security point of view I would say, it is not insecure: the first partial hash tree contains data object hash values only. Therefore one can only validate for data object hash values – and this is secure up to the used hash function.

@dghgit

The only catch in the case you mention is what if I've elected to let h2 expire, or even revoked the h2 evidence record somehow? I'd be a bit stuck as there would still be a record out there that makes h2 valid.

I would argue, the hash value h2 can only expire if the used hash function gets weak – but all other hash value in the ER are then weak as well. Of course we have to rehash in such a case. Further, as far as I know an ER cannot be revoked – it can be valid or invalid (at an instant of time and for a given hash value). It does no matter what happens with the data object (deleted, duplicated, what ever), the ER covers the data object hash value. For sure an ER gets invalid if the time stamp expires or the signing cert is revoked or the used cryptographic algorithms are broken. And for sure you cannot rehash if the data object is somehow lost – does not matter, because the ER is then „disconnected“ from data object.

raubv0gel avatar Aug 25 '22 13:08 raubv0gel

@dghgit, I’m not sure: does the bc API take hash algorithm expiry into account in case of ER validation? My implementation checks archiveTimeStamp.getHashAlgorithmInfo().isSecureAt(timeStampToken.getGenerationInstant()) for example. HashAlgorithmInfo#isSecureAt(Instant) returns true if and only if the hash algorithm was secure at the specified instant of time.

raubv0gel avatar Aug 25 '22 13:08 raubv0gel

I don't we have an equivalent to "archiveTimeStamp.getHashAlgorithmInfo().isSecureAt(timeStampToken.getGenerationInstant())" I'd be interested to know what it's based on - usually the problem with hashes algorithms going off isn't about when the hash was created, it's usually about if the document you're looking at is actually the one that was hashed. Once it becomes "easy" to generate collisions, it's really hard to be sure anymore.

With the (h1, h2) example, my point wasn't really about the hash algorithm expiring - if I renew the timestamp on a record containing (h1 h2) thinking it's only got h1 in it, and don't renew the time stamp on the h2 record as I regard it as no longer valid (so not a rehash), I've renewed the timestamp on h2 anyway. I don't think that works.

dghgit avatar Aug 25 '22 20:08 dghgit

I've emailed the three authors at the most current addresses I could find.

dghgit avatar Aug 26 '22 01:08 dghgit

@dghgit:

At section 5.3:

All Archive Timestamps within a chain MUST use the same hash algorithm and this algorithm MUST be secure at the time of the first Archive Timestamp of the following ArchiveTimeStampChain

That’s why I have implemented HashAlgorithmInfo#isSecureAt(Instant). I think it’s need for the evidence record validator.

I’m doing some preparation to publish my ERS code here on GitHub. We could then take a closer look …

raubv0gel avatar Aug 29 '22 09:08 raubv0gel

Re. relevant IETF WG to send the report to: the WG was named "ltans", and it's had concluded since. You can Google for it or search on the IETF.org site, but I don't know whether it's mailing list still alive or not.

Reporting to the authors is probably the best idea, though it might be a good idea to copy Area Directors of the IETF area that housed that WG.

mouse07410 avatar Aug 29 '22 11:08 mouse07410

@dghgit:

At section 5.3:

All Archive Timestamps within a chain MUST use the same hash algorithm and this algorithm MUST be secure at the time of the first Archive Timestamp of the following ArchiveTimeStampChain

That’s why I have implemented HashAlgorithmInfo#isSecureAt(Instant). I think it’s need for the evidence record validator.

I’m doing some preparation to publish my ERS code here on GitHub. We could then take a closer look …

Yes, a change in hash algorithm triggers an additional chain being added to the ArchiveTimeStampSequence. With the BC API the trigger is using a new hash algorithm.

dghgit avatar Aug 29 '22 11:08 dghgit

@dghgit

Yes, a change in hash algorithm triggers an additional chain being added to the ArchiveTimeStampSequence. With the BC API the trigger is using a new hash algorithm.

It’s according to the ER validation, not ER generation. The ER validator MUST check if the hash algorithm used in a chain was secure when the new chain was generated.

raubv0gel avatar Aug 29 '22 11:08 raubv0gel

Fair enough. When I read things like this I tend to think of a person rather than an API though. How are you doing it? Is the idea to pass in a map or config file of digests and possible not-after dates?

dghgit avatar Aug 29 '22 16:08 dghgit

@dghgit, sorry for being late!

I perform validation, among other places, at:

// the current archive time stamp chain must be generated while the hash algorithm of the previous archive time stamp chain was secure
if (!previousLastArchiveTimeStamp.getHashAlgorithmInfo().isSecureAt(currentFirstArchiveTimeStamp.getTimeStampToken().getGenerationInstant()))
    throw new EvidenceRecordValidationException("the archive time stamp chain was generated while the hash algorithm of the previous archive time stamp chain was not secure anymore");

and I have:

public record HashAlgorithmInfo(
		@NotNull String id,
		@NotNull String oid,
		@NotNull String name,
		int hashValueLength,
		@NotNull Instant insecureSinceInstant
) {

public static HashAlgorithmInfo Sha512 = new HashAlgorithmInfo("SHA512", NISTObjectIdentifiers.id_sha512.getId(), nameFinder.getAlgorithmName(NISTObjectIdentifiers.id_sha512), 64, Instant.MAX);

// …
	public boolean isSecureAt(@NotNull Instant instant) {
		return instant.isBefore(insecureSinceInstant());
	}
// …
}

raubv0gel avatar Sep 01 '22 07:09 raubv0gel

Okay, these things are often really local policy decisions. I'd suggest for something like this we come up with some properties that can go in java.security. Something along the lines of no properties set, anything goes, if properties set, digest must be listed used before some expiry data and/or within some date range.

Maybe something like (dates made up):

ers.digests.accepted: sha1, sha256, sha512 ers.digest.sha1.range: 1990-01-01,2020-12-31 ers.digest,sha256.expires: 2000-01-01,* ers.digest.sha512.expires: 2000-01-01,*

I can't actually think of an obvious reason for a validity start date, but on the other hand, there will be situations where people might only accept a particular algorithm from a particular date so having a start date would avoid any accidents.

What do you think?

dghgit avatar Sep 04 '22 07:09 dghgit

@dghgit, to make these things configurable is a good idea! Same, I do not see an obvious reason for a validity start date, too.

raubv0gel avatar Sep 04 '22 10:09 raubv0gel

Hmmm...

Maybe following certificate terminology might be better:

ers.accepted.digests: sha1, sha256, sha512 ers.digest.sha1.not_after: 2020-12-31

if it turns out a start date is needed we can just add a "not_before" later, as in:

ers.digest.sha256.not_before: 2000-01-01

How's that look?

dghgit avatar Sep 04 '22 14:09 dghgit

Finally found back some test data from BSI, in case you're interested. Apologies for the time it took... 20170316_Testdaten 2.zip

veebee avatar Sep 15 '22 09:09 veebee

I've added the test data. I'm relieved to say it appears we can parse and search all of it correctly.

dghgit avatar Sep 15 '22 18:09 dghgit

Has anyone got any feedback on the config outlined above?

dghgit avatar Sep 15 '22 18:09 dghgit

One other thing we're missing at the moment. There should be an easy way to convert a timestamp into an archive time stamp.

This is now done see ERSArchiveTimeStamp.fromTimeStampToken()

dghgit avatar Sep 29 '22 04:09 dghgit

Hi.

I'd like to point you to this https://github.com/de-bund-bsi-tr-esor/ERVerifyTool . It's from the BSI to validate ERS (but without timestamp validation).

I also want to point you to this reported issue ( https://github.com/de-bund-bsi-tr-esor/ERVerifyTool/issues/2 ) which says that RFC 4998 is in one point unclear (it's already reported to bsi & the authors ... but still no errata) . But RFC 6283 clarifies this.

The solution to be compatible is, to allow both sortings but only for this single problem.

btw. The properties solution for weak hashalgos is good. Keep in mind that it's also important for the timestamp verification!

fischerf avatar Oct 27 '22 08:10 fischerf

Has anyone got any feedback on the config outlined above?

An alternative is to use RFC 5698 - Data Structure for the Security Suitability of Cryptographic Algorithms (DSSC). It's about XML schema that describes algorithm validity.

tom-kuca avatar Nov 30 '22 13:11 tom-kuca