Daft icon indicating copy to clipboard operation
Daft copied to clipboard

fix: Add overflow protection to memory estimation

Open yudduy opened this issue 4 months ago • 1 comments

Fixes integer overflow crashes in memory estimation and partition operations during large dataset processing.

Changes:

  • Memory estimation: Add overflow guards, cap at usize::MAX/2
  • FixedSizeList: Limit 1M elements, check infinity
  • Shuffle: Use checked_mul(), fallback on overflow
  • Partitioning: Enforce 0 < n ≤ 100K

Breaking Change: IntoPartitionsConfig now requires validated constructor:

  • Before: IntoPartitionsConfig { num_partitions: 100 }
  • After: IntoPartitionsConfig::new(100)?

Testing: 25 new overflow tests, all existing tests pass

Relates to: #4724

yudduy avatar Oct 18 '25 06:10 yudduy

Greptile encountered an error while reviewing this PR. Please reach out to [email protected] for assistance.

greptile-apps[bot] avatar Oct 18 '25 06:10 greptile-apps[bot]

100k and 1M are heuristics and happy to add doc comments / tighten FixedSizeList cap if you'd prefer

yudduy avatar Nov 26 '25 22:11 yudduy