itertools
itertools copied to clipboard
Not sure on the difference between std::chunk_by and itertools::chunk_by
Hello!
std has chunk_by and so does itertools
It seems to me, that the only difference is that std does not return the value of the functor, But I'm not sure, and don't want to introduce a subtle error by switching over if i don't need the key.
Could we include a small comment in the documentation to help users choose between itertools' and std's implementation?
Thanks for bringing this to our attention.
I think your intuition that they yield the same chunks is correct.
slice::chunk_by works on slices and when you iterate over it, you'll get slices. Its predicate has the form FnMut(&T, &T)->bool, i.e. you tell it if two consecutive elements belong to the same chunk.
Itertools::chunk_by works on iterators. This is a bit more general, but needs more work compared to the pure slice version. iirc, it's a bit more cumbersome to use, too. Its functor has the form FnMut(Item)->K and it requires K: PartialEq, so that this functor extracts a "key" from the item, and uses K's equality to determine if they belong to the same chunk.
I think we should adopt what std does on slices here: In particular, our closure should be a predicate (seems strictly more general that what we currently have) and the documentation should be in line with std's. @jswrenn Thoughts?
Ah right. Yes, thanks for the explanation!
Yes, bool seems to me, more intuitive and perhaps more useful too.
I'm happy to close this issue since you've covered my concern, but I'll leave it open for someone else to close, since an interesting question has been posed.
I think we should adopt what std does on slices here: In particular, our closure should be a predicate (seems strictly more general that what we currently have) and the documentation should be in line with std's. @jswrenn Thoughts?
Yes, agreed.
Found one way this would be not strictly more general: to apply the predicate on (a_n, a_n_plus_1), we need to consume a_n_plus_1 from the inner iterator before yielding a_n, whereas with a key function it's enough to hold onto key(a_n). So in particular, if we were to implement chunk_by in terms of a chunk_by_eq, it would change the behavior of iter.by_ref().chunk_by(key).
Another question while I'm considering writing a PR: is it better to add a chunk_by_eq that works like slice::chunk_by or to rename the current chunk_by to chunk_by_key and then add a chunk_by that takes a FnMut(&T, &T) -> bool?