sdset icon indicating copy to clipboard operation
sdset copied to clipboard

[Feature] Streaming set operations via .from_iters() and .to_iter() functions

Open tzcnt opened this issue 2 years ago • 1 comments

Needs

I'd like to create from_iters() and to_iter() versions of each operation. This would lazily evaluate/produce each member of the sequences. This would not be as performant as the current exponential-search version when it comes to working slice-to-slice, but would allow for a few additional use cases:

  • Reading elements from an external source (perhaps a database query known to be sorted via UNIQUE + ORDER BY) using an asynchronous row iterator instead of needing to construct a Vec first.
  • Checking if the result of an operation (or series of operations) has any members (via the Iterator::any() method), without needing to materialize the entire result. For this operation, this would be a big performance improvement.
  • Perform a map() or filter() operation on each element of a produced result, and then insert the result into a HashMap or other data structure, without needing an intermediate Vec.

Proposed implementation details:

They should be compatible with the existing slices -> slice API. This means introducing 3 new dataflow paths: slices -> iter, iters -> slice, and iter -> iter.

Proposed Syntax:

let a: SetBuf<i32> = SetBuf::new_unchecked((0..1_000_000).collect());
let b: SetBuf<i32> = SetBuf::new_unchecked((0..1_000_000).collect());

// slices -> slice (current syntax)
let inter: SetBuf<i32> = sdset::duo::OpBuilder::new(&a, &b).intersection().into_set_buf();

// slices -> iter
let inter: SetBuf<i32> = sdset::duo::OpBuilder::new(&a, &b).intersection().into_iter();

// iters -> slice
let inter: SetBuf<i32> = sdset::duo::OpBuilder::from_iters(&a.iter(), &b.iter()).intersection().into_set_buf();

// iters -> iter
let inter: SetBuf<i32> = sdset::duo::OpBuilder::from_iters(&a.iter(), &b.iter()).intersection().into_iter();

To implement this, the existing Union/Intersection/Difference/SymmetricDifference types would need to be extended with an into_iter() function that would lazily consume the input slice. This would then cover the slices -> slice and slices -> iter case.

New types UnionOfIters/IntersectionOfIters/DifferenceOfIters/SymmetricDifferenceOfIters would need to be created whose a and b input fields are &'a mut dyn Iterator<T>. These would then have the into_set_buf() and into_iter() functions implemented, covering the iters -> slice and iters -> iter case.

tzcnt avatar Jun 07 '22 22:06 tzcnt