flurry icon indicating copy to clipboard operation
flurry copied to clipboard

Support rayon parallel iterators

Open jonhoo opened this issue 5 years ago • 10 comments

We should implement rayon parallel iterators. The notes and code in iter/plumbing may be helpful, and the hashbrown implementation too.

jonhoo avatar Jan 23 '20 13:01 jonhoo

I'm happy to answer rayon questions on this.

cuviper avatar Jan 23 '20 18:01 cuviper

@cuviper Since you're offering, one thing I was actually wondering was whether we can somehow take advantage of the fact that the map supports fully concurrent access. Since non-concurrent maps like hashbrown are also able to implement the parallel traits, it wasn't immediately clear to me whether how you get a win from flurry in rayon even though it really feels like it should be possible.

jonhoo avatar Jan 23 '20 18:01 jonhoo

I suspect maybe the answer has to do with some of the traits in iter/plumbing, but not sure?

jonhoo avatar Jan 23 '20 18:01 jonhoo

Concurrent insert will make a naive ParallelExtend and FromParallelIterator trivial, just something like par_iter.for_each(|(key, value)| { map.insert(key, value); }). Maybe there's a more advanced approach that could improve performance, like folding into separate bins and then reducing into the final map, but I don't know your data structure enough to evaluate that.

Parallel iterators are a bit harder -- you need a strategy for splitting the map into separate "slices"/"views" of some sort. The hashbrown implementation should be a good reference if you look at how they use RawIterRange::split internally.

cuviper avatar Jan 23 '20 18:01 cuviper

Ah, I see, hashbrown is forced for first reduce and then use one core to build the map, whereas we don't have to do that. Neat!

jonhoo avatar Jan 23 '20 21:01 jonhoo

I would like to give this a try.

Stupremee avatar Feb 04 '20 06:02 Stupremee

All yours!

jonhoo avatar Feb 04 '20 15:02 jonhoo

I need some help. Whats the best way to split a Table into two halves?

Stupremee avatar Feb 10 '20 14:02 Stupremee

The best way is probably to just split the list of bins in two: the "high" bins and the "low" bins.

jonhoo avatar Feb 10 '20 14:02 jonhoo

I have no idea how to approach this. I think it's better if I let someone else do it.

Stupremee avatar Feb 10 '20 16:02 Stupremee