DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Use threading in joins

Open bkamins opened this issue 3 years ago • 1 comments

A separate issue to keep track of this specific issue.

bkamins avatar Mar 14 '21 10:03 bkamins

The thing we could do:

  1. always precompute hashes (even for one col)
  2. precompute hashes using threading
  3. then split shorter table into shards matching number of available threads and process each shard separately.
  4. it should be very cheap to combine the results.

bkamins avatar Apr 18 '21 13:04 bkamins