itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Not sure on the difference between std::chunk_by and itertools::chunk_by

Open arifd opened this issue 11 months ago • 4 comments

Hello!

std has chunk_by and so does itertools

It seems to me, that the only difference is that std does not return the value of the functor, But I'm not sure, and don't want to introduce a subtle error by switching over if i don't need the key.

Could we include a small comment in the documentation to help users choose between itertools' and std's implementation?

arifd avatar Dec 17 '24 12:12 arifd

Thanks for bringing this to our attention.

I think your intuition that they yield the same chunks is correct.

slice::chunk_by works on slices and when you iterate over it, you'll get slices. Its predicate has the form FnMut(&T, &T)->bool, i.e. you tell it if two consecutive elements belong to the same chunk.

Itertools::chunk_by works on iterators. This is a bit more general, but needs more work compared to the pure slice version. iirc, it's a bit more cumbersome to use, too. Its functor has the form FnMut(Item)->K and it requires K: PartialEq, so that this functor extracts a "key" from the item, and uses K's equality to determine if they belong to the same chunk.

I think we should adopt what std does on slices here: In particular, our closure should be a predicate (seems strictly more general that what we currently have) and the documentation should be in line with std's. @jswrenn Thoughts?

phimuemue avatar Dec 17 '24 13:12 phimuemue

Ah right. Yes, thanks for the explanation!

Yes, bool seems to me, more intuitive and perhaps more useful too.

I'm happy to close this issue since you've covered my concern, but I'll leave it open for someone else to close, since an interesting question has been posed.

arifd avatar Dec 20 '24 11:12 arifd

I think we should adopt what std does on slices here: In particular, our closure should be a predicate (seems strictly more general that what we currently have) and the documentation should be in line with std's. @jswrenn Thoughts?

Yes, agreed.

jswrenn avatar Dec 26 '24 14:12 jswrenn

Found one way this would be not strictly more general: to apply the predicate on (a_n, a_n_plus_1), we need to consume a_n_plus_1 from the inner iterator before yielding a_n, whereas with a key function it's enough to hold onto key(a_n). So in particular, if we were to implement chunk_by in terms of a chunk_by_eq, it would change the behavior of iter.by_ref().chunk_by(key).

Another question while I'm considering writing a PR: is it better to add a chunk_by_eq that works like slice::chunk_by or to rename the current chunk_by to chunk_by_key and then add a chunk_by that takes a FnMut(&T, &T) -> bool?

ronnodas avatar Mar 10 '25 14:03 ronnodas