specification icon indicating copy to clipboard operation
specification copied to clipboard

Easier validation of JSON signatures

Open trishankkarthik opened this issue 7 years ago • 24 comments

@jawi reports:

From my experience implementing TUF in a different language (Java), it always caused me several headaches when trying to get the signing right. Mostly this is due to the canonicalisation that needs to be performed that only few of the more popular libraries support. Alternatively, you could take the same approach as is done in the JWS: calculate the signature over the base64-encoded JSON. It would allow you to verify the signature before you unpack the payload and parse it further.

Would that something worth to consider?

@tarcieri comments:

For what it's worth, I've been moving the opposite direction lately, and have been experimenting with Ben Laurie's ObjectHash format:

https://github.com/benlaurie/objecthash

I think it still needs some work, and my use cases are a bit different (one signature that can be validated against different encodings e.g. protos and JSON), but I really like the approach. It requires supporting a visitor that can walk the structure, but eliminates problems with canonicalizing the format of the JSON prior to computing a signature.

trishankkarthik avatar Aug 20 '16 16:08 trishankkarthik

I don't like the ObjectHash option because it appears to require parsing untrusted data before signature verification.

titanous avatar Aug 20 '16 17:08 titanous

Something like ObjectHash is more useful when you wish to serialize data in multiple formats. I would suggest using a more off-the-shelf scheme if you intend to deal only with JSON.

tarcieri avatar Aug 20 '16 17:08 tarcieri

Maybe I'm overlooking something, but I fail to see how ObjectHash authenticates the data?

jawi avatar Aug 23 '16 10:08 jawi

ObjectHash computes a nested hash tree somewhat analogous to a Merkle tree. Included in the calculation of the "Merkle root" for any particular nested object graph are the digests of all of the data in the object graph (sans the structural elements of e.g. JSON or proto serialization)

tarcieri avatar Aug 23 '16 15:08 tarcieri

I noted CBOR mentioned on the "[tuf] C Implementation of the TUF on client side" thread.

One of the great failings of CBOR, IMO, is that JOSE and COSE signatures are completely distinct as they include the encoding as part of the data to be signed, and therefore do not permit transcoding of the data between formats while preserving the same signature.

This is, of course, the primary advantage of using something like ObjectHash.

tarcieri avatar Aug 23 '16 18:08 tarcieri

IMO, the only freedom you gain from ObjectHash is the format to store the retrieved data in, as long as you are able to convert it to the same ObjectHash (that can be verified using the original signature). But, I agree, this can be a compelling argument for using ObjectHash.

Though, in practise, I'm mostly inclined to "just" store the original files as-is since I already need to ability to verify those signatures anyway. So I'm wondering how valuable it is to have the same signature for different representations of the same data.

jawi avatar Aug 24 '16 10:08 jawi

Hey guys. I'm just jumping in on this to point out that we are experiencing headaches of sorts with canonical JSON too.

From the TUF spec (emphasis mine):

4.1. Metaformat All documents use a subset of the JSON object format, with floating-point numbers omitted. When calculating the digest of an object, we use the "canonical JSON" subdialect as described at http://wiki.laptop.org/go/Canonical_JSON

Canonical JSON is absolutely not a subdialect in that it allows \n (a literal newline) in strings among other non-escaped characters or the fact that "strings" are just byte-arrays and don't have to be valid UTF-8. This has so far only been annoying when we are dealing with RSA keys. For example, an RSA key in JSON format would be:

{"keyvalue":"-----BEGIN RSA PUBLIC KEY-----\n.........\n-----END RSA PUBLIC KEY-----"}

The above is technically valid CJSON as well, but the correct way to express the above payload as CJSON would be:

{"keyvalue":"-----BEGIN RSA PUBLIC KEY-----
.........
-----END RSA PUBLIC KEY-----"}

This has lead to the problems of inconsistent encodings/decodings across language/library boundaries, especially in the case where TUF defines the keyid = sha256(cjson(pub_key_string)). This means that in JSON land, the hash is applied over a \n at each newline, but in CJSON land, it is applied over \ and then n giving different key IDs.

Another thing is that it seems developers are abusing Content-Type: application/json for CJSON encoded data which is incorrect. If CJSON is kept around, I'd like to see the spec mention something like "developers SHOULD NOT use application/json and SHOULD use application/canonical-json for HTTP responses" to remind everyone how they should actually encode/decode data.

I know you guys are working on ASN.1 as the encoding format, but I would like to cast my vote for fully ripping out CJSON as this has led to a number of pain points and still requires parsing of untrusted data.

heartsucker avatar Apr 19 '17 14:04 heartsucker

Haven't looked into it closely, but what does everyone on this thread think about JWS?

Cc: @JustinCappos

trishankkarthik avatar Apr 19 '17 14:04 trishankkarthik

Another very important consideration is backwards-compatibility, which is a big deal for clients who are already using TUF in production...

trishankkarthik avatar Apr 19 '17 14:04 trishankkarthik

From the looks of it, a TUF usable implementation wouldn't work with JWS since you just concatenate (simplification) b64(payload) || '.' || b64(sig_value). The sig isn't structured, and whatever the solution would be would need to be more structured.

I imagine the ASN.1 would be:

Metadata ::={
    signatures = SEQUENCE Signature,
    signed = SEQUENCE byte
}

And then after the validation of the signed portion, the application parses the bytes into whatever structure it's supposed to be. This gets around the problem of parsing untrusted data since we're only treating the signed body as a byte stream until we know it's what we want.

For example:

bytes = requests.get('localhost:8080/targets.json').body
parsed_meta = asn1_parse(bytes)
if not verify(parsed_meta, root=root, snapshot=snapshot):
    raise VerificationError('oh dear')
targets_meta = targets_parse(parsed_meta.signed)
return targets_meta

heartsucker avatar Apr 19 '17 14:04 heartsucker

Yeah, unfortunately this doesn't address backwards compatibility, but that's what major version changes are for. :)

heartsucker avatar Apr 19 '17 14:04 heartsucker

I'd just like to point out that technically TUF doesn't follow it's own spec in that when JSON is parsed it uses this

https://github.com/secure-systems-lab/securesystemslib/blob/master/securesystemslib/util.py#L844

And that just does json.load which doesn't respect CJSON encodings.

heartsucker avatar Apr 19 '17 15:04 heartsucker

Are you referring to this part of the specification?

When calculating the digest of an object, we use the "canonical JSON" subdialect as described at http://wiki.laptop.org/go/Canonical_JSON.

I think we do indeed canonicalize JSON metadata prior to generating digests. create_signature() canonicalizes the metadata here.

vladimir-v-diaz avatar Apr 19 '17 15:04 vladimir-v-diaz

It seems I've missed the discussion about ASN.1, I only see some excerpts of it being mentioned a few years back. Is this something that is seriously being considered for TUF?

Anyways, the use of canonical JSON is still troublesome as mentioned again by @heartsucker. I think the biggest problem is the fact that the RSA (or whatever) keys are embedded in a format that does not play nice when used in combination with JSON. We could either replace the current public key format with something like JWK, base64 encode it once more, or define yet another custom encoding.

jawi avatar Apr 19 '17 15:04 jawi

needless to say, I think the cleanest approach would be to use JWKs :)

jawi avatar Apr 19 '17 15:04 jawi

What I mean is that function is used in the tuf.client.Updater._get_metadata_file when it loads metadata. It is treating the loaded metadata as JSON and not as CJSON which means that if you parse RSA keys, you're going to run into the \n v. \\ & n problem I mentioned above.

For example, the file tests/repository_data/repository/metadata/1.root.json is JSON and not CJSON, and so far as I can tell, there isn't anywhere in the code that decodes CJSON. So the tests are passing because you're turning the \n into a literal newline, but if you parsed the file as CJSON, all the \n's would be treated as \\ and n which would break your hashes.

heartsucker avatar Apr 19 '17 15:04 heartsucker

Loading JSON metadata happens separately from verifying signatures. We verify signatures in the expected CJSON format here, which should match what occurs when the signature is generated (linked in my previous comment).

I'm not against JWK, only explaining the approach used in the reference implementation.

vladimir-v-diaz avatar Apr 19 '17 15:04 vladimir-v-diaz

I'm a big fan of objecthash for this purpose, especially preserving the structure of data and avoiding the whole can of worms that is canonicalization entirely:

https://github.com/benlaurie/objecthash

However, it's a bit ill-specified at the moment.

tarcieri avatar Apr 19 '17 15:04 tarcieri

@tarcieri : I'm still pondering about how ObjectHash would solve the line ending issue of the public key format?

jawi avatar Apr 19 '17 15:04 jawi

To solve that particular problem, I would suggest getting rid of the PEM wrapper and doing a simple Base64 encoding of the raw DER contained within the PEM.

tarcieri avatar Apr 19 '17 15:04 tarcieri

On 19 April 2017 at 11:12, Jan Willem Janssen [email protected] wrote:

It seems I've missed the discussion about ASN.1, I only see some excerpts of it being mentioned a few years back. Is this something that is seriously being considered for TUF?

Yes, we're planning to support multiple data formats (e.g., JSON, ASN.1/DER, etc.). ASN.1/DER was primarily developed for @uptane. @awwad can say more about it...

trishankkarthik avatar Apr 19 '17 15:04 trishankkarthik

@vladimir-v-diaz Ok, yes I see what you mean now. My mistake. I guess I was going down the wrong path because I couldn't calculate the same key IDs that are in the test cases:

$ jq -r '.signed.keys."5602f4df0cd26b2112f0833b1ce8d5fcbb595754961d3a04f37b9815e2ced503".keyval.public' tests/repository_data/repository/metadata/1.root.json | sha256sum
f592cc46408c6c29124203998fee89553889c02e50c94065d5f05f1dbf594871  -
$ jq '.signed.keys."5602f4df0cd26b2112f0833b1ce8d5fcbb595754961d3a04f37b9815e2ced503".keyval.public' tests/repository_data/repository/metadata/1.root.json | sed -e 's/"//g' | sha256sum
0e07775f044772a860647f827942df85370b0aa387a49df545a3dc0d26eebc9e  -

and also

import hashlib, canonicaljson, json
with open('tests/repository_data/repository/metadata/1.root.json') as f:
    j = json.loads(f.read())

b = canonicaljson.encode_canonical_json(j['signed']['keys']['5602f4df0cd26b2112f0833b1ce8d5fcbb595754961d3a04f37b9815e2ced503']['keyval']['public'])
h = hashlib.sha256()
h.update(b)
h.hexdigest()
# '32b01395a271d0f34ab8c7c47da3eaf0ae25e76a9ef44cd7b10257e473961529'

heartsucker avatar Apr 19 '17 15:04 heartsucker

On 19 April 2017 at 11:59, heartsucker [email protected] wrote:

@vladimir-v-diaz https://github.com/vladimir-v-diaz Ok, yes I see what you mean now. My mistake. I guess I was going down the wrong path because I couldn't calculate the same key IDs that are in the test cases:

I suppose the least we can do now is to improve our documentation (with runnable code) on how to compute / verify signatures...

trishankkarthik avatar Apr 19 '17 16:04 trishankkarthik

I'm jumping in late here, but want to mention we're certainly moving TUF so that it doesn't require a specific format (e.g., canonical-json). TAP 7 https://github.com/theupdateframework/taps/blob/tap7/tap7.md talks about some of our early thoughts in this area.

For conformance testing, you'll just have to write code that plugs into our test harness to translate into your format. The tests should all be able to run from there.

Note: folks on the Docker team and also some our collaborators in the automotive industry are encouraging us to accept TAP 7 soon. We will likely be moving forward fairly quickly, unless issues come up during discussion of the TAP.

On Wed, Apr 19, 2017 at 12:02 PM, Trishank Karthik Kuppusamy < [email protected]> wrote:

On 19 April 2017 at 11:59, heartsucker [email protected] wrote:

@vladimir-v-diaz https://github.com/vladimir-v-diaz Ok, yes I see what you mean now. My mistake. I guess I was going down the wrong path because I couldn't calculate the same key IDs that are in the test cases:

I suppose the least we can do now is to improve our documentation (with runnable code) on how to compute / verify signatures...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theupdateframework/tuf/issues/362#issuecomment-295323487, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0XDzlGAMILl5riL5WiEv9LrYmAfpYQks5rxjAHgaJpZM4JpHYi .

JustinCappos avatar Apr 19 '17 16:04 JustinCappos

Closing since DSSE is on the horizon

trishankkarthik avatar Oct 16 '23 03:10 trishankkarthik