Daft
Daft copied to clipboard
enable setting default number of partitions to use for joins in `DaftContext`
Is your feature request related to a problem? Please describe.
shuffle_aggregation_default_partitions
already exists to set the number of partitions to some sane default for an entire job. It would be helpful to have a similar config for join operations. Those migrating from Spark may be surprised that a migrated job starts OOMing when daft uses one partition for the join.
Describe the solution you'd like
Add shuffle_join_default_partitions
configuration to DaftContext
.
Describe alternatives you've considered
Manually set partitioning via into_partitions
or repartition
before join
.
Additional context N/A