IterTools.jl icon indicating copy to clipboard operation
IterTools.jl copied to clipboard

Differences in `Base.Iterators.partition` and `IterTools.partition`

Open ssfrr opened this issue 4 years ago • 7 comments

There are some differences between Base.Iterators.partition and IterTools.partition. I brought it up on Slack and @oxinabox asked me to file an issue.

  1. Base.Iterators.partition returns each chunk as an Array, whereas IterTools.partition returns a Tuple
  2. In the case that the last chunk is smaller than the requested partition size, Base.Iterators.partition returns a shorter chunk, whereas IterTools.partition drops the last values.
  3. IterTools.partition accepts a step argument that determines the hop size for each chunk.

ssfrr avatar Nov 25 '19 02:11 ssfrr

update - I think on master Base.Iterators.partition now returns views into the original array, not Arrays.

This might (?) be the last inconsistency that would need to be addressed before something like #30 could be resolved. How do folks want to resolve the differences listed above? I'd propose using the Base.Iterators behavior for (1) and (2), but keeping the step argument.

One path forward would be to add the step argument as a PR to Base, then just reexport Base.Iterators.partition. This would also take advantage of the recent improvements that @mbauman made.

ssfrr avatar Mar 05 '20 19:03 ssfrr

Change to make Base.Iterators.partition return views was done in: https://github.com/JuliaLang/julia/pull/33533

omus avatar Mar 05 '20 21:03 omus

I'm in favor of keeping the current behavior of IterTools.partition, perhaps under a different name, to avoid confusion with Base.

For example, implementing Base.diff, but for an iterable it instead of just AbstractVectors is very natural in terms of IterTools.partition(it, 2, 1), and the fact that it produces a stream of tuples, which have a length known at compile time is (I'm guessing) beneficial for efficient code generation.

goretkin avatar May 19 '20 02:05 goretkin

Yeah, it's sort of a shame that the implementation needs to be different depending on whether it's designed for long or short partitions (which would use Arrays or Tuples, respectively). For instance, in my use case (streaming audio) I often want to chop the stream into chunks of 512 or 1024 elements, so Tuples aren't good for that. Maybe the partition type should be provided as an optional argument to partition?

ssfrr avatar May 24 '20 01:05 ssfrr

Oh, good. That's an important use too. Base.Iterators.partition returns views into the array, instead of copies, now. Does that fit the audio chunking need?

goretkin avatar May 24 '20 15:05 goretkin

Yeah, array views are good for me

ssfrr avatar May 24 '20 17:05 ssfrr

Difference 2 is handled in Clojure by having partition-all and partition, where base partition maps to Clojure's partition-all behavior.

Difference 1 & 2 are grouped together in Transducers under Consecutive vs Partition. So you either get fixed length tuples or variable length arrays.

I think this roughly aligns with common usage patterns where you either want to have small fixed length groups, or process large chunks of an array.

In my ideal universe both consecutive (IterTools partition) and partition would be in base, both supporting a step argument.

pepijndevos avatar Jun 14 '23 15:06 pepijndevos