go-ipld-prime icon indicating copy to clipboard operation
go-ipld-prime copied to clipboard

fasttraversal package; mutable path objects

Open warpfork opened this issue 5 years ago • 6 comments

It's not that the traversal package isn't fast. But it does do some things in the name of 'safety' that sort of prevent it from being really fast.

Namely, it copies traversal.Progress by value many times (might not be a huge issue; I think they all end up on the stack, or at least I sure hope they do; a benchmark to check this is warranted); and, it uses ipld.Path.Join heavily, which most definitely involves a new allocation every single time we traverse over a node. This latter one is pretty severe.

We could amortize a ton of these allocations used for creating Path values if we had a mutable implementation of Path, and just pre-allocated an overly large slice for it, and kept changing that one value in-place rather than creating a whole new Path for every single step we take in a traversal.

This is a big "if", and comes with big caveats, though. It becomes very very easy to misuse such an API where values have limited lifetimes in which they're correct, and yet at the same time we can provide no compile-time checks to ensure references to those values are never held outside of those lifetimes.

Making a fasttraversal package might be a good compromise. Code can be written against the traversal package until the user notices this is a performance issue; then the changeover should be easy. We could keep it with the exact same user-facing interface as the traversal package -- so you could change the import statement from "go-ipld-prime/traversal" to traversal "go-ipld-prime/fasttraversal" and make no other changes! -- but with the caveats that if you hold onto any part of the fasttraversal.Progress (and most importantly, its contained ipld.Path values) outside of your VisitFn, then your application will have incorrect/undefined behavior.

There is plenty of prior art for this. For a particular example, many git implementations I've seen (in a variety of languages!) have iterators which do exactly this thing with the mutable paths and lifetime caveat; it's a known issue that this will become a performance thing when you're flying over a lot of data like this quickly. So, although I find such mutability-heavy APIs a little scary for confident correctness reasons, there's definitely established references out there suggesting it's worth it.

This could be a bit tricky to implement. The ipld.Path implementation would also have to change to accomodate some of this, which is a change in the core package and thus touchy. I'd like to have the type system indicate if a path is going to be mutable, but that isn't possible without getting different visitor function interfaces, which might be undesirable. And having a fasttraversal.Progress type is also unfortunate for the same reasons of changing the visitor function interface defn; maybe this can be handled with type aliases, but I'm not actually sure if that's allowed or not.

warpfork avatar Feb 14 '20 13:02 warpfork

pprof-allocs-traversing

Here's some evidence for the cost of Path manipulation. It's definitely the biggest thing on the chart.

This comes after and includes performance improvements from https://github.com/ipld/go-ipld-prime/pull/49 (if it didn't, you'd also see a lot of allocations in the use of iterators from boxing allocations during handling keys) -- once those fixes are implied, it's pretty much Path that's left as a major cost.

Iterators could also potentially be improved. See 7c595d6920b1182695ac0b7791c25e1d3ab5905f for that; that commit didn't land, but has some useful notes; it's definitely doable.

I haven't taken a look at the selector.NewSegmentIterator part yet. Might be possible to improve that too.

warpfork avatar Mar 05 '20 13:03 warpfork

Screenshot 2020-03-06 13 09 00

How does one read this, who owns that 10923032?

rvagg avatar Mar 06 '20 02:03 rvagg

Good question. Neither, I think -- that's the number that pass through that area. Note how there's a "0 of x" inside the boxes those arrows are attached to.

4456516+8367117=12823633. Everything's coming from the functions called below these.

And apparently 12823633-10923032=1900601 is the number of allocs owned by the last layer of recursions. (Probably not a particularly actionable detail, though.)

warpfork avatar Mar 06 '20 11:03 warpfork

There's a bunch of different lenses you can use to view this data, btw; I just posted a screenshot of one of the more visual all-in-one summaries. If someone was gonna drive at addressing this, they'd want to just take this as a broad starter cue, and then generate and inspect their own pprof data.

(I have some sparse writeups about how-to-pprof I'm slowly accumulating in a gist, in case it's helpful to anyone out there: https://gist.github.com/warpfork/b13d4e0afcfc571cbec8cb4373f61510 )

warpfork avatar Mar 06 '20 11:03 warpfork

Ahh, Path/AppendSegment takes up 8M, selector/NewSegmentIterator takes up 2M and basic/MapIterator takes up 2M, accounting for the 12M coming through the top of the funnel.

👍

rvagg avatar Mar 09 '20 10:03 rvagg

So, this is still showing up as a pretty big CPU sink in graphsync.

Stebalien avatar Oct 19 '21 23:10 Stebalien