Define less ambiguous UnixFS spec in TS types
I would like to include less ambiguous UnixFS specification coded as TS interface definition so that:
- It can capture more invariant than possible via provided protobuf schema.
- It is clear which fields are required in which node types.
- It is more clear how file shards, chunks & roots are represented (in terms of necessary fields and link types)
You may notice several @TODO comments as despite my best efforts I still have gaps in my understanding. I would love if you could help me fill those by providing inline comments and/or suggestions.
Despite my best effort, I may still be misunderstanding and incorrectly encoding / describing things, if so I would really appreciate comments and/or suggestions to correct those errors.
I would like to iterate on this document and eventually land this for everyone who may try to get a better understanding of the UnixFS format.
Got no permissions to request review so instead I'm tagging people by names @aschmahmann @achingbrain @rvagg @alanshaw @ribasushi
I'm not sure introducing typescript into the spec is a good idea? We already have prose and protobuf as definition languages, adding a third just increases the probability of contradictions between them all.
It seems like the problem this is aiming to solve is that lots of fields can be set, most of them are optional and it's not clear which ones should be set and when, and what to do when they are or are not - can we not just make the clarifications by updating the body of the text & adding comments to the protobuf parts?
I'm not sure introducing typescript into the spec is a good idea? We already have prose and protobuf as definition languages, adding a third just increases the probability of contradictions between them all.
I think it's a lot easier to resolve these contradictions canonically here than hope that those will not arise in the implementations. If there was a language neutral way to specify this I would definitely prefer that over TS, but I don't believe one exists and I would much rather have something that:
- I can run program against to verify.
- Translate to my language.
My motivation for writing this is precisely protobufs inability to disambiguate, even after going through the both JS and Go implementations I'm not certain I've captured it all accurately. I don't think it's lack of my abilities at fault, I think things are just too vaguely specified.
It seems like the problem this is aiming to solve is that lots of fields can be set, most of them are optional and it's not clear which ones should be set and when, and what to do when they are or are not - can we not just make the clarifications by updating the body of the text & adding comments to the protobuf parts?
I don't think it's one or the other, I would like to do both. It's just in type notation you have a well defined structure and you can be a lot more precise. If it is decided this file should not live here, that's ok with me too, but I would still love to make sure it is (and therefor my interpretation of spec is) accurate. I have a full intention to incorporate this as prose as well.
As an FYI, @Jorropo identified an interesting quirk in go-ipfs's (more likely go-unixfs's) current leaf building which will leave an empty Name where it's not necessary: https://github.com/ipfs/go-ipfs/issues/8691
It's more of a hash consistency concern than anything, we have plenty of data in the wild with blank Name and we actively test all the dag-pb variations identified in the dag-pb spec (see the fixtures starting with "dagpb" @ https://github.com/ipld/codec-fixtures/tree/master/fixtures). It would be nice to bias toward not doing this where possible, so not locking in such an expectation in any of our specs.
Few things I'd discovered in the realm of unspecified dags
ipfs ls /ipfs/Qme1Cyu7ujqn3dRkRGmeTLpHJgbGHFjKmud48XK5W8qA6h
Error: only murmur3 supported as hash function
ipfs dag get /ipfs/Qme1Cyu7ujqn3dRkRGmeTLpHJgbGHFjKmud48XK5W8qA6h
{"Data":{"/":{"bytes":"CAU"}},"Links":[]}
# this is what went into that { Type: Data.DataType.HAMTShard }
ipfs ls /ipfs/QmS73FCYEH2NaSypVDE8EzQsjXoGTa8Wxq1bC22LfYnbiR ✱
Error: hamt size should be a power of two
# This is what went into that { Type: Data.DataType.HAMTShard, hashType: 0x22 }
ipfs ls /ipfs/Qma5kEnM5fEKTXrFC5zXYRy5QG3hcMWopoFS7ijhxx19qc
{"Data":{"/":{"bytes":"CAUoIjCAAg"}},"Links":[]}
# This is what went into that { Type: Data.DataType.HAMTShard, hashType: 0x22, fanout: 256 }
# looks like leaving out bitfield is fine but not other two fields
For what it's worth js-unixfs will encode all of the above just fine.
Here is another fun example, illustrating that filesize can't be trusted and probably should not be even encoded as it could be derived from Data + blocksizes.
echo '{ "Data": "CAISC2hlbGxvIHdvcmxkGAU=", "Links": [] }' | ipfs object put --datafieldenc base6
ipfs files stat /ipfs/QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU ✱
QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU
Size: 5
CumulativeSize: 19
ChildBlocks: 0
Type: file
ipfs cat /ipfs/QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU | wc -c
11
ipfs cat /ipfs/QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU ✱
hello world
Maybe we should make this a TODO in GitHub rather than hiding it in code: https://github.com/ipfs/go-unixfs/blob/5e0e095d575b5b5fbf2fea386ea11ab737e4395e/unixfs.go#L260-L265