python-tuf icon indicating copy to clipboard operation
python-tuf copied to clipboard

Implement DSSE bridge?

Open jku opened this issue 3 years ago • 8 comments

The fact that TUF metadata contains the non-canonical form of the payload is a known issue (see https://github.com/secure-systems-lab/dsse for future plans).

While we wait for the spec to evolve, I wonder if we should implement a sort of bridge API between DSSE and current TUF Metadata? Metadata.to_dsse_bytes() / Metadata.from_dsse_bytes() or something.

This would allow e.g. a repository to require the admin/developer upload API to use DSSE (allowing the repository to never parse large amounts of unverified json) while still allowing both the admin tools and the actual published repository to work with current TUF metadata and current python-tuf API.

jku avatar Sep 01 '22 08:09 jku

This is an interesting thought. Transferring existing signatures should work well for Metadata --> DSSE, because the DSSE payload (bytes) could be the canonical json representation of the Metadata payload (Signed), so the original Metadata signatures could still be verified with the DSSE envelope. For the other way around, it can't be guaranteed, which means we'd have to discard the signatures in Metadata.from_dsse_bytes().

lukpueh avatar Sep 01 '22 08:09 lukpueh

As alternative, if sign() or another method exposed the canonical payload this could maybe be implemented outside of Metadata API itself.

jku avatar Sep 01 '22 08:09 jku

FYI: @PradyumnaKrishna has implemented DSSE in securesystemslib and is integrating it with in-toto, to make it work alongside the traditional metadata wrapper as described in ITE-5. The current plan is to have an API that looks like this:

any_metadata = AnyMetadata.from_file(path) # ITE-5 defines algorithm to distinguish traditional metadata and DSSE 
any_metadata.verify(verifier)
payload = any_metadata.get_payload()

# use payload ...

any_metadata = create_envelope(payload, traditional_or_dsse)
any_metadata.sign(signer)
any_metadata.to_file(path)

We are still trying to figure out a simple architecture that works for both tuf and in-toto. The deserialization/canonicalization aspect is especially challenging for a common interface, because it needs to be configured, to account for different payload types, and it happens at different points in time.

  • traditional metadata including the payload is fully deserialized on read (from_file) and serialized on write (to_file)
  • DSSE only de/serializes the envelope on read/write but not the payload
  • traditional metadata needs to serialize (canonicalize) the payload on signature creation/verification
  • DSSE needs to de/serialize the payload, on get_payload/create_envelope

EDIT: s/metadata/any_metadata in pseudo-code so Jussi maybe stays sane

lukpueh avatar Sep 01 '22 09:09 lukpueh

As alternative, if sign() or another method exposed the canonical payload this could maybe be implemented outside of Metadata API itself.

Can you elaborate?

lukpueh avatar Sep 01 '22 09:09 lukpueh

(I'm going to call the python-tuf top-level concept Metadata, and the DSSE top-level concept Envelope, to maybe stay sane.)

The deserialization/canonicalization aspect is especially challenging for a common interface, because it needs to be configured, to account for different payload types, and it happens at different points in time

Yes. I imagined that python-tuf could implement e.g. a Metadata.from_dsse(data: bytes, delegator: Metadata, role: str) which could

  • parse data as a DSSE Envelope
  • use the keys/threshold in delegator to verify the signatures in the envelope
  • Create metadata from the payload and signatures, return the result

so pseudocode:

def from_dsse(data: bytes, delegator: Metadata, role: str) -> Metadata:
    env = DSSE.Envelope.from_bytes(data)
    
    # TODO: refactor/reimplement verify_delegate() to verify that threshold of keys defined 
    # in delegator have signed the payload
    
    # assume we have implemented Signed.from_bytes() and that signatures are compatible or can be converted
    return Metadata(Signed.from_bytes(env.payload), env.signatures)

EDIT: I think the signatures aren't compatible as is so that code is missing at least a step to convert them

EDIT2 : of course the envelope payload is also base64 so maybe the Signed constructor should be Signed.from_dsse_payload() or something

jku avatar Sep 01 '22 11:09 jku

As alternative, if sign() or another method exposed the canonical payload this could maybe be implemented outside of Metadata API itself.

Can you elaborate?

Actually scratch that about sign(), I wasn't thinking it all the way through: what I mean is if the canonical payload is available through the API, then I think someone could generate a DSSE Envelope for a Metadata object outside of the Metadata API.

jku avatar Sep 01 '22 12:09 jku

I like that in your idea we can keep using old Metadata as we used to and don't have to worry about finding a common abstraction for Metadata and DSSE. And given that we verify signatures before deserialization (and conversion), which is the whole purpose of DSSE, we could even discard them before we convert to Metadata, so that DSSE is not required to create the signature of the canonical representation of the payload.

It does feel a bit awkward to create a Metadata object, when we really only need the payload anymore. But I guess that's a price for maintaining backwards compatibility with the current Metadata API?

Also, your use case assumes that we know that the bytes are DSSE. But at least for in-toto, we need an interface that takes any bytes and deserializes them either as Metadata or DSSE depending on the deserialization results.

lukpueh avatar Sep 01 '22 12:09 lukpueh

... we could even discard them [Signatures] before we convert to Metadata, so that DSSE is not required to create the signature of the canonical representation of the payload.

Turns out we should not use canonical json for DSSE payloads. As @PradyumnaKrishna and I just (re)discovered, canonical json is not json. See https://github.com/theupdateframework/python-tuf/issues/457 for details.

This hasn't been a problem so far, because we always only encoded canonical json to generate sigs, but put standard json on the wire. So, if we want to put canonical json on the wire and also decode later, we'd have to write a custom canonical json decoder.

But I'd rather choose a non-bridge approach as in https://github.com/in-toto/in-toto/pull/503, where the formats can be used interchangeably but don't need to be compatible.

lukpueh avatar Sep 08 '22 09:09 lukpueh

I read the whole discussion, but somehow I didn't understand where was said we would need to have canonical JSON on the wire in order to use DSSE?

But I'd rather choose a non-bridge approach as in https://github.com/in-toto/in-toto/pull/503, where the formats can be used interchangeably but don't need to be compatible.

From what I understand the in-toto DSSE pr suggests an additional abstraction that based on the bytes decides what format it is. If we want to achieve the same we would need to write an additional abstraction that does the same for python-tuf and it decides whether to call a separate DSSE.Envelope.from_bytes(data) method or Metadata.from_bytes(data) method, am I right?

Last question: do we want to start supporting DSSE after merging a TAP 14 implementation or there is a sense in having a separate effort similar to what was done in https://github.com/in-toto/in-toto/pull/503 and not waiting for TAP 14?

MVrachev avatar Sep 27 '22 15:09 MVrachev

the advantage of using canonical JSON as the signed content in the DSSE would have been that you could go from DSSE to current TUF metadata without resigning anything (which would be useful to avoid that parsing-unsigned-unsafe-data issue)... but Lukas is likely right that this is just not possible as "canonical JSON" is not actually JSON and we can't parse it as JSON :facepalm:

So likely we just want to close this issue as wishful thinking ... but if you have any new ideas feel free to expand

jku avatar Sep 28 '22 07:09 jku

Now, the question I have is what are we doing about DSSE after we close this issue? As I mentioned above:

Last question: do we want to start supporting DSSE after merging a TAP 14 implementation or there is a sense in having a separate > effort similar to what was done in https://github.com/in-toto/in-toto/pull/503 and not waiting for TAP 14?

What do you think @jku @lukpueh ?

MVrachev avatar Sep 28 '22 12:09 MVrachev

@MVrachev, we don't need to wait for TAP 14 to add the possibility of writing and reading DSSE metadata to python-tuf.

lukpueh avatar Oct 03 '22 11:10 lukpueh

My plan is to finalize https://github.com/in-toto/in-toto/pull/503, and see if the same approach can be used for python-tuf.

lukpueh avatar Oct 03 '22 11:10 lukpueh

Closing because we can't automatically bridge from DSSE to the current TUF wrapper because canonical JSON is not valid JSON. The most promising route forward is the in-toto approach which we can adopt once it lands in securesystemslib.

joshuagl avatar Oct 19 '22 09:10 joshuagl