mug icon indicating copy to clipboard operation
mug copied to clipboard

Why limit the chunk size?

Open massahud opened this issue 3 years ago • 3 comments

Hello,

I have a question, why does MoreStreams.dice limits the chunk size if the stream is infinite? What is the reasoning behind limiting only for infinite streams and not for all streams?

https://github.com/google/mug/blob/e310555966b6589155750e250fe4fcb4aeff9ecb/mug/src/main/java/com/google/mu/util/stream/MoreStreams.java#L539

massahud avatar Jun 03 '22 14:06 massahud

If the stream is finite, then we can usually hold the stream in-memory even if the user code calls dice(stream, Integer.MAX_VALUE).

If the stream is infinite, then it's crucial that we don't blow up at the face of dice(stream, MAX_VALUE) where the user may not have expected an infinite stream.

fluentfuture avatar Jun 15 '22 04:06 fluentfuture

Thank you for the explanation, I was thinking about the unnecessary memory allocations of the chunk ArrayList until it reaches maxSize, not about the user passing Long.MAX_VALUE as parameter.

massahud avatar Jun 15 '22 16:06 massahud

I believe the logic is this:

  • If the stream is shorter than max size, then just initialize the list to the stream's size.
  • If the stream is longer than max size, we should just use maxSize.
  • Except, when estimate==MAX_VALUE, it's not an indication of knowing that the stream is really longer than maxSize. Rather, it's just the stream implementation not knowing anything. In that case, allocating to the full maxSize could be wasteful.

fluentfuture avatar Sep 18 '22 04:09 fluentfuture