DataFrames.jl Use threading in joins

Use threading in joins

Open bkamins opened this issue 3 years ago • 1 comments

A separate issue to keep track of this specific issue.

Mar 14 '21 10:03 bkamins

The thing we could do:

always precompute hashes (even for one col)
precompute hashes using threading
then split shorter table into shards matching number of available threads and process each shard separately.
it should be very cheap to combine the results.

Apr 18 '21 13:04 bkamins