specs icon indicating copy to clipboard operation
specs copied to clipboard

Update UnixFS specification

Open Jorropo opened this issue 3 years ago • 8 comments

We need:

  • A proper unixfs spec
    • how to calculate offsets
      • document difference between Tsize (total subdag size, raw data + envelopes) and raw file data (without IPFS metadata), and how to read /interpret each.
    • how to read & create HAMT directories
      • incorporate https://github.com/multiformats/go-multihash/issues/135#issuecomment-791178958
    • protobufs
      • incorporate/reference https://ipld.io/specs/codecs/dag-pb/spec/
  • Some testing fixtures.
  • incorporate https://github.com/ipfs/specs/issues/162
  • incorporate this explainer
  • incorporate real world case where someone created their own file by hand

Jorropo avatar Sep 01 '22 15:09 Jorropo

Cc @rvagg @dignifiedquire that Kubo maintainers are going to take the first stab at getting this written in September. Feel free to watch or leave any notes.

BigLep avatar Sep 01 '22 18:09 BigLep

Our latest set of trials & tribulations from Iroh: https://github.com/n0-computer/iroh/pull/198 and our running doc of papercuts: https://number-zero.notion.site/UnixFs-742339892d9c47d5b79f4f942e661bbf

b5 avatar Sep 01 '22 18:09 b5

@b5 about https://github.com/n0-computer/iroh/pull/198 I think balanced tree is not in the spec. Or at least, if someone really care about it, it's a non authoritative part of the spec.

As long as you get your file sizes rights, and the merkle dag is correct (mean that a correctly build decoder successfully rebuild the original content). You can use whatever scheme you like.

Jorropo avatar Sep 01 '22 21:09 Jorropo

Sure, maybe not an authoritative part of the spec, but as Lidel pointed out in the implementers call yesterday, there are many things that would be good to suggest within spec documents that give implementers hints so the don't footgun themselves.

No one says the dag needs to be balanced. Everyone ends up implementing a balanced tree at some point.

b5 avatar Sep 02 '22 15:09 b5

Some additional asks, based on real world problems I've seen:

  • make it clear that a chunking strategy and the way DAG is constructed / balanced is up to implementation, but..
    • "notes for implementers" section should give an example of basic implementation (size-based chunker, balanced tree) so people who are in a rush and don't care about performance end up with a sane default and don't reinvent a square wheel
  • make it clear what Tsize means in context of UnixFS and non-UnixFS sub-DAGs (total size, including all IPFS/IPLD envelopes)
    • add "note for implementers" with how reading byte range of a bug file should be done
      • this is ridiculously important, some of our own go libraries did not use it correctly and used Tsize instead of raw file size

lidel avatar Sep 05 '22 11:09 lidel

Another ask is having a recommendation for how to add non-canonical / extensions to the spec e.g. systematization for extra metadata

willscott avatar Sep 22 '22 15:09 willscott

I want to do this, but as a followup IPIP.

First I start by describing what Kubo do without any new inovation, then we see what we can improve. It will be impossible to review if we mix and match old and new stuff.

Jorropo avatar Sep 22 '22 16:09 Jorropo

Here is the in-progress PR: https://github.com/ipfs/specs/pull/331

BigLep avatar Nov 11 '22 00:11 BigLep