Daft icon indicating copy to clipboard operation
Daft copied to clipboard

enable setting default number of partitions to use for joins in `DaftContext`

Open gmweaver opened this issue 5 months ago • 3 comments

Is your feature request related to a problem? Please describe. shuffle_aggregation_default_partitions already exists to set the number of partitions to some sane default for an entire job. It would be helpful to have a similar config for join operations. Those migrating from Spark may be surprised that a migrated job starts OOMing when daft uses one partition for the join.

Describe the solution you'd like Add shuffle_join_default_partitions configuration to DaftContext.

Describe alternatives you've considered Manually set partitioning via into_partitions or repartition before join.

Additional context N/A

gmweaver avatar Sep 09 '24 18:09 gmweaver