specs icon indicating copy to clipboard operation
specs copied to clipboard

DagCBOR: be more precise about foreign tags

Open vmx opened this issue 5 years ago • 10 comments

In the DagCBOR spec, be more precise on what happens if you parse CBOR and it contains tags.

  • Should they be ignored?
  • Should they be preserved?
  • Should there be an error?

I suggest that we just ignore those tags and when you write DagCBOR, the only tag you ever write will be the one for CID (tag 42).

vmx avatar Nov 28 '19 18:11 vmx

+1 on your suggested approach

rvagg avatar Nov 28 '19 21:11 rvagg

I'd say there should be an error by default, and a "parseLoosely" method as an option.

Preserving the extra data without a way to read or to mutate it doesn't make for a pleasant user experience (nor an efficient library -- btw, cbor tags are technically allowed to contain other tags, and if you try to implement this you will be very very sad at how many allocs this can force you to make if you actually support it).

Ignoring the extra data means we're not really validating that it's DagCBOR. This is potentially dangerous. (Think: how the comment field in PDFs made it much, much easier to create hash collisions in (already weak) hashes. Even if it's not immediately broken, it's wading out towards the sharks in a very unnecessary way.)

Erroring explicitly is the choice that remains standing.


We can still also do the "parseLoosely" features in libraries in practice. This would buy us is being able to load CBOR documents that aren't exactly DagCBOR and manipulate them. But I don't think the specs themselves should hedge on this. And users should need to very much opt into such a usage: the number of cases where a user will want this, versus the number of cases where a user would be surprised how much bending-over-backwards the library is doing to support features they didn't need/want, is very leaning towards the latter, I'd bet.

warpfork avatar Nov 30 '19 08:11 warpfork

To connect to other discussions happening elsewhere (no link, I thought it was in a team meeting but perhaps in a private discussion), there's also other parts of CBOR which can be "loose", e.g. the number 255 can be one of 18FF, 1900FF, 1A000000FF, 1B00000000000000FF. So while we're at considering what to do with tags outside of 42 we should also be considering whether we want there to be only one way of encoding data with CBOR for valid DagCBOR and perhaps our parsers all need to have a "strict" mode so that we can enforce that.

I think the remaining discussion on this is regarding how much it's worth the effort compared to everything else we're trying to do? It obviously has dedupe benefits but I'm still a little sceptical that the other reasons for doing this raise above the level of "it's just cleaner!".

/cc @ribasushi

rvagg avatar Jan 01 '20 03:01 rvagg

@rvagg actually deduplication is not the primary reason for wanting this. It is a part of maintaining the symmetry of "this cryptographic id represents this thing, and this thing is only represented by this cryptographic id" ( theoretical limits notwithstanding )

Yes, it is trivial to add a single padding byte somewhere and change every id. But it should not be possible to express bit-for-bit identical content under multiple cryptographic IDs. The simplest use-case I can think of are takedown requests.

Again it comes back to what is actually being built: a product or a platform. If the latter - I strongly believe any low-hanging fruit allowing disambiguation should be baked into the lowest levels of the fabric of said platform.

ribasushi avatar Jan 06 '20 15:01 ribasushi

deterministic map ordering is in this rough category too, discussed in https://github.com/ipld/team-mgmt/blob/8d612e3ce1222de9dd531ca14bdeeb662bd2a3a1/meeting-notes/2020/2020-01-20--ipld-sync.md#notes, we need to make sure our codecs document rules for these while we're documenting these other "strict" encoding/decoding details.

rvagg avatar Jan 20 '20 22:01 rvagg

Sounds like we should prepare to have a Data Model Spec Sprint Day or something...? I think we're starting to accumulate a decent number of these things and we should get em all on an agenda and knock em out.

warpfork avatar Jan 22 '20 09:01 warpfork

ya, we need another close-a-thon in the specs repo in general, but it would probably be most productive to have at least one session focused on Data Model only.

mikeal avatar Jan 22 '20 17:01 mikeal

@mikeal it's more a "getting-a-lot-done" rather then closing. Most open things are valid things that need some work (in the previous round we were able to close old/out of scope things).

vmx avatar Jan 22 '20 18:01 vmx

so, just to give my thoughts on this subject.

In my use-case for the IPID DID spec that uses IPLD, it would be very nice to support tag 98 (COSE signing) of CID as tag 42. This provides me with all of the semantics and tooling necessary to accomplish my goal natively in CBOR and IPLD. I'm sure this could also be a method to help with data integrity for DAG sync. Also, @hsanjuan might find this approach interesting for shared clustering.

98(
  [
    / protected / h'a10300' / {
        \ content type \ 3:0
      } / , 
    / unprotected / {}, 
    / payload /  42: 'bafyreibtkfbzqiwdhd5bv53wrag43jbz4ytyixigde6wa5zuddxciq7s2m', 
    / signatures / [
      [
        / protected / h'a10126' / {
            \ alg \ 1:-7                    \\  alg = ECDSA 256 \\
          } / , 
        / unprotected / {
          / kid / 4:'did:ipid:12D3KooWDXDEZtLW5k16DjeHWkFZEeTKUF7PnzzY7m2UocmduhZV;/publicKey/0'
        }, 
        / signature / h'e2aeafd40d69d19dfe6e52077c5d7ff4e408282cbefb5d06cbf414af2e19d982ac45ac98b8544c908b4507de1e90b717c3d34816fe926a2b98f53afd2fa0f30a'
      ]
    ]
  ]
)

jonnycrunch avatar Jan 22 '20 23:01 jonnycrunch

https://github.com/ipld/specs/pull/236

rvagg avatar Jan 23 '20 06:01 rvagg