specs icon indicating copy to clipboard operation
specs copied to clipboard

Define less ambiguous UnixFS spec in TS types

Open Gozala opened this issue 4 years ago • 7 comments

I would like to include less ambiguous UnixFS specification coded as TS interface definition so that:

  1. It can capture more invariant than possible via provided protobuf schema.
  2. It is clear which fields are required in which node types.
  3. It is more clear how file shards, chunks & roots are represented (in terms of necessary fields and link types)

You may notice several @TODO comments as despite my best efforts I still have gaps in my understanding. I would love if you could help me fill those by providing inline comments and/or suggestions.

Despite my best effort, I may still be misunderstanding and incorrectly encoding / describing things, if so I would really appreciate comments and/or suggestions to correct those errors.

I would like to iterate on this document and eventually land this for everyone who may try to get a better understanding of the UnixFS format.

Gozala avatar Jan 31 '22 23:01 Gozala

Got no permissions to request review so instead I'm tagging people by names @aschmahmann @achingbrain @rvagg @alanshaw @ribasushi

Gozala avatar Jan 31 '22 23:01 Gozala

I'm not sure introducing typescript into the spec is a good idea? We already have prose and protobuf as definition languages, adding a third just increases the probability of contradictions between them all.

It seems like the problem this is aiming to solve is that lots of fields can be set, most of them are optional and it's not clear which ones should be set and when, and what to do when they are or are not - can we not just make the clarifications by updating the body of the text & adding comments to the protobuf parts?

achingbrain avatar Feb 01 '22 06:02 achingbrain

I'm not sure introducing typescript into the spec is a good idea? We already have prose and protobuf as definition languages, adding a third just increases the probability of contradictions between them all.

I think it's a lot easier to resolve these contradictions canonically here than hope that those will not arise in the implementations. If there was a language neutral way to specify this I would definitely prefer that over TS, but I don't believe one exists and I would much rather have something that:

  1. I can run program against to verify.
  2. Translate to my language.

My motivation for writing this is precisely protobufs inability to disambiguate, even after going through the both JS and Go implementations I'm not certain I've captured it all accurately. I don't think it's lack of my abilities at fault, I think things are just too vaguely specified.

It seems like the problem this is aiming to solve is that lots of fields can be set, most of them are optional and it's not clear which ones should be set and when, and what to do when they are or are not - can we not just make the clarifications by updating the body of the text & adding comments to the protobuf parts?

I don't think it's one or the other, I would like to do both. It's just in type notation you have a well defined structure and you can be a lot more precise. If it is decided this file should not live here, that's ok with me too, but I would still love to make sure it is (and therefor my interpretation of spec is) accurate. I have a full intention to incorporate this as prose as well.

Gozala avatar Feb 01 '22 19:02 Gozala

As an FYI, @Jorropo identified an interesting quirk in go-ipfs's (more likely go-unixfs's) current leaf building which will leave an empty Name where it's not necessary: https://github.com/ipfs/go-ipfs/issues/8691

It's more of a hash consistency concern than anything, we have plenty of data in the wild with blank Name and we actively test all the dag-pb variations identified in the dag-pb spec (see the fixtures starting with "dagpb" @ https://github.com/ipld/codec-fixtures/tree/master/fixtures). It would be nice to bias toward not doing this where possible, so not locking in such an expectation in any of our specs.

rvagg avatar Feb 02 '22 01:02 rvagg

Few things I'd discovered in the realm of unspecified dags

ipfs ls /ipfs/Qme1Cyu7ujqn3dRkRGmeTLpHJgbGHFjKmud48XK5W8qA6h
Error: only murmur3 supported as hash function


ipfs dag get /ipfs/Qme1Cyu7ujqn3dRkRGmeTLpHJgbGHFjKmud48XK5W8qA6h 
{"Data":{"/":{"bytes":"CAU"}},"Links":[]}

# this is what went into that { Type: Data.DataType.HAMTShard }

ipfs ls /ipfs/QmS73FCYEH2NaSypVDE8EzQsjXoGTa8Wxq1bC22LfYnbiR                                                                ✱
Error: hamt size should be a power of two

# This is what went into that { Type: Data.DataType.HAMTShard, hashType: 0x22 }

ipfs ls /ipfs/Qma5kEnM5fEKTXrFC5zXYRy5QG3hcMWopoFS7ijhxx19qc
{"Data":{"/":{"bytes":"CAUoIjCAAg"}},"Links":[]}

# This is what went into that { Type: Data.DataType.HAMTShard, hashType: 0x22, fanout: 256 }
# looks like leaving out bitfield is fine but not other two fields

For what it's worth js-unixfs will encode all of the above just fine.

Gozala avatar Feb 04 '22 04:02 Gozala

Here is another fun example, illustrating that filesize can't be trusted and probably should not be even encoded as it could be derived from Data + blocksizes.

echo '{ "Data": "CAISC2hlbGxvIHdvcmxkGAU=", "Links": [] }' | ipfs object put --datafieldenc base6
ipfs files stat /ipfs/QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU                                                        ✱
QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU
Size: 5
CumulativeSize: 19
ChildBlocks: 0
Type: file
ipfs cat /ipfs/QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU | wc -c
11
ipfs cat /ipfs/QmPivGwShRuUC9gB17RakWKhixsTU5PMvxBZjXfXxo4KcU                                                               ✱
hello world

Gozala avatar Feb 04 '22 07:02 Gozala

Maybe we should make this a TODO in GitHub rather than hiding it in code: https://github.com/ipfs/go-unixfs/blob/5e0e095d575b5b5fbf2fea386ea11ab737e4395e/unixfs.go#L260-L265

rvagg avatar Feb 07 '22 02:02 rvagg