itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Extend k_smallest with largest and key options

Open orlp opened this issue 3 years ago • 7 comments

Currently this crate only has k_smallest. Compare this to the offering for min/max (which are the same function, just limited to k = 1):

position_max
position_max_by
position_max_by_key
position_min
position_min_by
position_min_by_key

I think it would be nice if we extended k_smallest to:

k_smallest
k_smallest_by
k_smallest_by_key
k_largest
k_largest_by
k_largest_by_key

orlp avatar Dec 09 '21 20:12 orlp

If this has a good chance of being accepted I'm willing to make a pull request for this feature.

orlp avatar Dec 09 '21 20:12 orlp

Sounds great to me!

If you decide to do a PR, could you try to implement k_smallest and k_smallest_by_key in terms of k_smallest_by? (So that the implementations are tied together and do not diverge.)

phimuemue avatar Dec 10 '21 15:12 phimuemue

This has a great chance of getting merged! 🙂

jswrenn avatar Dec 10 '21 16:12 jswrenn

Am I understanding it correctly that there's a bug in the current implementation?

https://github.com/rust-itertools/itertools/blob/master/src/k_smallest.rs#L10

This assumes the iterator is a FusedIterator, does it not? I think we need to call .fuse() first?

EDIT: yes, this is wrong: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cc076206949046426a9d76dc005368b1

orlp avatar Dec 14 '21 16:12 orlp

Looking at it, I don't see a way we could implement this using std::collections::BinaryHeap, if we don't wish to duplicate the comparison function k times (and probably destroy inlining). I'm thinking of manually implementing the sifting logic - mostly copying from the stdlib, any thoughts on this?

orlp avatar Dec 14 '21 17:12 orlp

This assumes the iterator is a FusedIterator, does it not? I think we need to call .fuse() first?

Or skip the for_each if heap.len() < k.

scottmcm avatar Dec 14 '21 18:12 scottmcm

Looking at it, I don't see a way we could implement this using std::collections::BinaryHeap, if we don't wish to duplicate the comparison function k times (and probably destroy inlining). I'm thinking of manually implementing the sifting logic - mostly copying from the stdlib, any thoughts on this?

I guess your problem is that BinaryHeap does not support custom comparison functions by itself (please correct me if I'm wrong) - and I've come to the same conclusion when I once ran into this problem.

Copying the sifting logic might be a viable solution (unless it's a huge amount of code). I think we already did something similar (see https://github.com/rust-itertools/itertools/blob/master/src/unziptuple.rs#L42-L46). If you decide to go that route, please add documentation in a similar fashion.

phimuemue avatar Dec 17 '21 07:12 phimuemue

An alternative solution to #654 would be to use the binary_heap_plus crate which supports custom comparison functions.

One problem is their MSRV is 1.56 compared to Itertool's 1.32. I haven't looked into it but it might be possible to make their crate more backwards compatible.

It would keep the implementations within this crate much smaller. For example (not exactly the same API): https://github.com/Alextopher/advent-of-code-2022/blob/main/aoc/src/iterstuff.rs#L29

Alextopher avatar Dec 06 '22 18:12 Alextopher