multicodec icon indicating copy to clipboard operation
multicodec copied to clipboard

Table: Application specific ranges

Open jbenet opened this issue 7 years ago • 10 comments

We were going to keep some application specific ranges. I'm not finding them in the table. We should add them back.

Ideally we should have a function that keeps some numbers within every varint size. Meaning that:

  • a 1-byte varint should have a few value reserved for app-specific codes.
  • a 2-byte varint should have some too, many more than 1-byte.
  • etc.

Effectively: come up with a function appSpecific(code int) bool that:

  • assigns some values at every varint range
  • is simple to understand

For example, one such function could be:

func appSpecific(code int) bool {
    return code % 127 > (127 - 8)
    // keeps 8 codes every 128.
}

jbenet avatar Feb 26 '17 10:02 jbenet

There should be a simple way to come up with appSepecific function that distributes them in a matter that the longer the code the more values app specific code can have. So for example 1 value in one byte code, 2 values in two bytes, ..., 512 values in 9 byte code. We might want to shift this around/scale, cap.

I was thinking about giving the app specific range at exact value of the highest byte in varint. So one code in 1 bytes rage, 127 codes in in two bytes range, 16384 in three bytes, but IMO it gets too big too fast.

Kubuxu avatar Feb 26 '17 10:02 Kubuxu

Any progress on this? We are considering using CIDs to link between data structures at http://www.uprtcl.io/, but we need to be able to define application-specific codecs.

Also libraries concerning multicodecs (at least https://github.com/multiformats/js-cid, https://github.com/multiformats/js-multicodec or https://github.com/multiformats/rust-cid) don't support adding custom codecs. They seem to throw errors when we try to define CIDs with custom codecs. Is this the behaviour we should expect?

guillemcordoba avatar May 06 '19 10:05 guillemcordoba

First of all, that project looks awesome.


Application specific codes are useful for data that never pass application boundaries but you'll loose quite a few of the interoperability benefits of IPLD if use one in CIDs. It'll also be painful to migrate away from these app-specific codecs later as everything in IPLD is content addressed. If need-be, I'd be happy to just define a new codec if need-be.

Note: there are valid use-cases for app-specific IPLD codecs, I just want to make sure this case is one of them.

To make sure we're on the same page, how are you planning on serializing your objects? For context, a CIDs "codec" isn't really supposed to be used as a "type". Instead it's there to indicate the serialization format for the object.

So, the question is, have you considered just using CBOR or some other standard serialization format? We have codecs defined for git, eth, btc, etc. because these systems all existed before IPLD.

They seem to throw errors when we try to define CIDs with custom codecs. Is this the behaviour we should expect?

Ideally, no. We should consider allowing arbitrary unknown CID codecs, failing only when we try to decode a given block. Mind opening issues in js-cid, rust-cid, and go-cid?

Stebalien avatar May 06 '19 18:05 Stebalien

Ideally, no. We should consider allowing arbitrary unknown CID codecs, failing only when we try to decode a given block. Mind opening issues in js-cid, rust-cid, and go-cid?

You can’t encode the CID if you can’t map the codec string key to a multicodec hex value, that’s why they throw.

mikeal avatar May 06 '19 18:05 mikeal

You can’t encode the CID if you can’t map the codec string key to a multicodec hex value, that’s why they throw.

https://github.com/multiformats/js-cid/pull/72 is related as it would allow to pass in a number instead of a string.

vmx avatar May 07 '19 17:05 vmx

multiformats/js-cid#72 is related as it would allow to pass in a number instead of a string.

Sure, but it’ll still throw when you decode the CID. The way the CID interfaces are built assume they have a map to the string value and back. It’s pretty involved even adding codecs to these maps for the purpose of developing new codecs, it’s an area we should consider improving the experience of in a broader way than just the one issue.

mikeal avatar May 07 '19 17:05 mikeal

it’s an area we should consider improving the experience of in a broader way than just the one issue.

@mikeal js-ipld is already using numbers instead of strings for codecs for that reason.

vmx avatar May 07 '19 17:05 vmx

@vmx CID instances expose the string key for the codec property though, so as soon as you get to that stage you need a string value unless we want to push out a breaking change to that interface. A fair amount of code assumes it is a string value today. While I think the new stack could easily support numeric values internally it still produces CID instances and the consumers probably assume the codec property is a string.

mikeal avatar May 07 '19 17:05 mikeal

@mikeal Good point. Though you could probably just create a string (e.g. unknown-<the-codec-number> if it isn't known and things should still work. What I try to say is, that we should keep that case in mind and keep moving into that direction. But I agree currently it's not easily possible.

vmx avatar May 07 '19 18:05 vmx

What I try to say is, that we should keep that case in mind and keep moving into that direction. But I agree currently it's not easily possible.

I totally agree. I think we should zoom out a little and ask “what would make using/developing new codecs better?” If we spent a little more time considering the full use case I think we’ll find a better overall direction and changes to multiple libraries that can improve this.

mikeal avatar May 07 '19 18:05 mikeal