itertools
itertools copied to clipboard
Extend k_smallest with largest and key options
Currently this crate only has k_smallest. Compare this to the offering for min/max (which are the same function, just limited to k = 1):
position_max
position_max_by
position_max_by_key
position_min
position_min_by
position_min_by_key
I think it would be nice if we extended k_smallest to:
k_smallest
k_smallest_by
k_smallest_by_key
k_largest
k_largest_by
k_largest_by_key
If this has a good chance of being accepted I'm willing to make a pull request for this feature.
Sounds great to me!
If you decide to do a PR, could you try to implement k_smallest and k_smallest_by_key in terms of k_smallest_by? (So that the implementations are tied together and do not diverge.)
This has a great chance of getting merged! 🙂
Am I understanding it correctly that there's a bug in the current implementation?
https://github.com/rust-itertools/itertools/blob/master/src/k_smallest.rs#L10
This assumes the iterator is a FusedIterator, does it not? I think we need to call .fuse() first?
EDIT: yes, this is wrong: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cc076206949046426a9d76dc005368b1
Looking at it, I don't see a way we could implement this using std::collections::BinaryHeap, if we don't wish to duplicate the comparison function k times (and probably destroy inlining). I'm thinking of manually implementing the sifting logic - mostly copying from the stdlib, any thoughts on this?
This assumes the iterator is a FusedIterator, does it not? I think we need to call .fuse() first?
Or skip the for_each if heap.len() < k.
Looking at it, I don't see a way we could implement this using
std::collections::BinaryHeap, if we don't wish to duplicate the comparison functionktimes (and probably destroy inlining). I'm thinking of manually implementing the sifting logic - mostly copying from the stdlib, any thoughts on this?
I guess your problem is that BinaryHeap does not support custom comparison functions by itself (please correct me if I'm wrong) - and I've come to the same conclusion when I once ran into this problem.
Copying the sifting logic might be a viable solution (unless it's a huge amount of code). I think we already did something similar (see https://github.com/rust-itertools/itertools/blob/master/src/unziptuple.rs#L42-L46). If you decide to go that route, please add documentation in a similar fashion.
An alternative solution to #654 would be to use the binary_heap_plus crate which supports custom comparison functions.
One problem is their MSRV is 1.56 compared to Itertool's 1.32. I haven't looked into it but it might be possible to make their crate more backwards compatible.
It would keep the implementations within this crate much smaller. For example (not exactly the same API): https://github.com/Alextopher/advent-of-code-2022/blob/main/aoc/src/iterstuff.rs#L29