specs
specs copied to clipboard
Update UnixFS specification
We need:
- A proper unixfs spec
- how to calculate offsets
- document difference between
Tsize(total subdag size, raw data + envelopes) and raw file data (without IPFS metadata), and how to read /interpret each.
- document difference between
- how to read & create HAMT directories
- incorporate https://github.com/multiformats/go-multihash/issues/135#issuecomment-791178958
- protobufs
- incorporate/reference https://ipld.io/specs/codecs/dag-pb/spec/
- how to calculate offsets
- Some testing fixtures.
- incorporate https://github.com/ipfs/specs/issues/162
- incorporate this explainer
- incorporate real world case where someone created their own file by hand
Cc @rvagg @dignifiedquire that Kubo maintainers are going to take the first stab at getting this written in September. Feel free to watch or leave any notes.
Our latest set of trials & tribulations from Iroh: https://github.com/n0-computer/iroh/pull/198 and our running doc of papercuts: https://number-zero.notion.site/UnixFs-742339892d9c47d5b79f4f942e661bbf
@b5 about https://github.com/n0-computer/iroh/pull/198 I think balanced tree is not in the spec. Or at least, if someone really care about it, it's a non authoritative part of the spec.
As long as you get your file sizes rights, and the merkle dag is correct (mean that a correctly build decoder successfully rebuild the original content). You can use whatever scheme you like.
Sure, maybe not an authoritative part of the spec, but as Lidel pointed out in the implementers call yesterday, there are many things that would be good to suggest within spec documents that give implementers hints so the don't footgun themselves.
No one says the dag needs to be balanced. Everyone ends up implementing a balanced tree at some point.
Some additional asks, based on real world problems I've seen:
- make it clear that a chunking strategy and the way DAG is constructed / balanced is up to implementation, but..
- "notes for implementers" section should give an example of basic implementation (size-based chunker, balanced tree) so people who are in a rush and don't care about performance end up with a sane default and don't reinvent a square wheel
- make it clear what
Tsizemeans in context of UnixFS and non-UnixFS sub-DAGs (total size, including all IPFS/IPLD envelopes)- add "note for implementers" with how reading byte range of a bug file should be done
- this is ridiculously important, some of our own go libraries did not use it correctly and used Tsize instead of raw file size
- add "note for implementers" with how reading byte range of a bug file should be done
Another ask is having a recommendation for how to add non-canonical / extensions to the spec e.g. systematization for extra metadata
I want to do this, but as a followup IPIP.
First I start by describing what Kubo do without any new inovation, then we see what we can improve. It will be impossible to review if we mix and match old and new stuff.
Here is the in-progress PR: https://github.com/ipfs/specs/pull/331