notes on canonicalization, dag traversal order
@whyrusleeping said (in the filecoin community slack)
@jbenet i’ll do a pass on canonicalization for the car format, but one thing i’m thinking through right now is that you can’t just naively CAR things up and store them in a sector.
For example if you tried to store the filecoin blockchain this way, it would include all the state trees, and even (depending on a few things) all the data in the network
my question about traversal order is because we need to be able to have a specific traversal order to ensure CAR files are deterministically generated
so the thing being referenced needs to be a selector
@hannahhoward Does the ipld selectors work strictly define graph traversal order?
my question about traversal order is because we need to be able to have a specific traversal order to ensure CAR files are deterministically generated
@hannahhoward said (in the filecoin community slack)
The IPLD Selectors spec do not strictly define a traversal order though right now it’s depth first in the implementation. (if I’m understanding your question correctly) It’s certainly possible to to define a blockchain selector that doesn’t traverse the statetree. That’s being done in go-filecoin right now.
@jbenet said (in the filecoin community slack)
For example if you tried to store the filecoin blockchain this way, it would include all the state trees, and even (depending on a few things) all the data in the network
yeah agree. (that would be a pretty big car
so the thing being referenced needs to be a selector
meaning: the car's pointer (in the header) should be a selector? yeah we probably should say what selector this car is for. (not sure if we can get rid of the root -- i think we can, but maybe check) (it's fine for a car to be "incomplete" -- meaning to have cid links to things it does not contain. -- but may be good to flag in the header whether it is supposed to be complete or not. if we get the selector path to work, then it can be complete in terms of the selector, but not complete in terms of all links it contains).
my question about traversal order is because we need to be able to have a specific traversal order to ensure CAR files are deterministically generated
yes this is a pretty important concern. I think we need to always have a defined order (ie assign a default order to all ipld selectors. (if selectors get enhanced with ability to define order, that'd be cool, but not sure they can do that right now)
what order? probably in-order depth-first traversal is what makes more sense from a theory standpoint, but i think breadth first may be higher perf in a ton of cases. -- whatever we pick here is probably gonna impact the perf of lots of unknown applications, pick whatever seems higher perf. -- and i think many filesystems papers will probably have solved this question many times over, could be a quick lit lookup
why breadth first seems higher perf? -- this is not a well-substantiated claim -- guess: if we start reading at a spot (beginning, or anywhere close to an index datastructure (eg a directory) the memory page pulled into the cache should also contain siblings and their first children -- a lot of the index of a large graph, therefore good for lookups? -- i think depth first may be really nice for some graphs (like left-leaning long history trees -- blockchain headers, git commit graphs).
(sounds like something one might want to be able to select in the car :confused: (maybe declare a default for all selectors, future versions of selectors could enhance to have a different order, future versions of car can enhance to respect that, and/or give the ability to override the order?)
(i'll post this in the car issue tracker for posterity)
cc @warpfork -- relevant to your interests