ucanto icon indicating copy to clipboard operation
ucanto copied to clipboard

Integrate IPLD schema toolchain

Open Gozala opened this issue 3 years ago • 5 comments

We have an IPLD schema language and JS tooling around it. We should find a way to integrate it into this library, specifically here are some ideas:

  1. A tool that can generate capability defs from the schema definition.
    • It could generate derive functions that just fail right away.
  2. Ability to generate IPLD schema from the capability definitions into a serial format.
  3. Ability to generate IPLD schema in it's JSON representation

Gozala avatar Oct 05 '22 16:10 Gozala

I did some exploration by implementing toIPLDSchema method to a Schema interface

https://github.com/web3-storage/ucanto/blob/582c4c504d2e75feee2c5d298ea08b2f01ff1c5e/packages/validator/src/schema/type.ts#L15-L31

However I run into a problem, because IPLD schemas do not support inline structs or unions which means schema like this is unable to generate a single definition

Schema.struct({
   root: Schema.string(),
   shards: Schema.dict({ value: Schema.string() })
})

In fact it needs to generate two types one for the outer struct and one for the shards field and name them.

I do not want to use UUIDs or some other random identifiers, instead I would like to use CIDs of the definition so that same schema would end up with a same name. However generating sha256 is async operation and introducing asynchrony here is not a a good idea

Instead I have been considering to use murmur3 hash (in this context we don't need cryptgraphic hashes), however I'm not sure what would be a hash collision rate.

Alternative approach might be to emit non-standard IPLD schemas and just inline structs. In theory we could still map to standard IPLD schema async by generating sh256 CIDs.

Gozala avatar Jan 05 '23 18:01 Gozala

Created an issue to lift the inline types restriction https://github.com/ipld/ipld/issues/262

Gozala avatar Jan 05 '23 19:01 Gozala

However generating sha256 is async operation and introducing asynchrony here is not a a good idea

Is this in general, or do you just mean in the WebCrypto API? e.g. I think node crypto or js-sha256 can do sync?

gobengo avatar Jan 05 '23 20:01 gobengo

wrt murmur. if there are no other options than it, I'd say if its good enough for UnixFS, it's good enough for this?

gobengo avatar Jan 06 '23 02:01 gobengo

wrt murmur. if there are no other options than it, I'd say if its good enough for UnixFS, it's good enough for this?

UnixFS hashes only directly entry names. I suspect smaller payloads are less prone to collisions, but I’m not confident this assumption is accurate. More importantly there’s logic in place to handle hash collisions when they occur & I can think of way to deal with them in this context.

Gozala avatar Jan 06 '23 03:01 Gozala