futures-rs icon indicating copy to clipboard operation
futures-rs copied to clipboard

Generalize the various chunk methods to produce any collection that implements Default + Extend

Open khuey opened this issue 2 years ago • 5 comments

Fixes #2492

khuey avatar Sep 15 '21 18:09 khuey

If a _Set or _Map is used as the collection, duplicate keys overwrite each other, and the len-like of the actual data structure doesn't increase. Now that not only Vec can be used, we should examine what we mean when we want to set a cap/capacity for the adapter. Both "how many items to consume" and "how many items to yield" are valid options. The current implementation does the former. The latter is missing a trait afaik, but may be less surprising. ~~Or more surprising, such as causing a soft deadlock with chunks(N) by not doing something necessary for more elements to be produced until a chunk is yielded because there were duplicate keys. (Where each "poke" of the source produces exactly N items.) (Could be fixed by the user with .take(N).chunks(N), but at that point they might as well collect because it won't be reusable.)~~ 🐴

qm3ster avatar Sep 20 '21 09:09 qm3ster

Also, since there is no with_capacity trait, should this be earmarked for when https://doc.rust-lang.org/std/iter/trait.Extend.html#method.extend_reserve is stabilized?

qm3ster avatar Sep 20 '21 09:09 qm3ster

Only the former is implementable today. The latter would require a (stablized) trait for collection lengths. Since the caller already has to cope with undersized collections (at least for the final item with chunks, and potentially much more often for ready_chunks or try_chunks) I don't think that's a big problem.

I don't think there's any need to gate this on extend_reserve being stabilized because the implementation can be changed to use that some day with no externally visible changes.

khuey avatar Sep 20 '21 19:09 khuey

Regarding extend_reserve, that's what I meant.

Perhaps something like "Depending on collection type, collections of less than capacity items may be yielded, as long as capacity elements were consumed." (but better worded) with or without an example in the doc comment. Probably only chunks needs this.

qm3ster avatar Sep 21 '21 14:09 qm3ster

Yes I think we would document that N items are consumed and if your collection's Extend impl can result in replacement rather than addition (like *Set/Map can) then the yielded collection will have fewer than N items.

If you do want collections of N items no matter what you could implement chunks yourself with scan. It's a bit yucky but it's doable. e.g.

my_stream.map(Some).chain(None).scan(HashMap::new(), |map, item| {
    match item {
        // End of the stream, yield whatever partial collection is left.
        None => Some(Some(mem::take(map))),
        Some(item) => {
            map.extend(item);
            if map.len() >= DESIRED_N {
                return Some(Some(mem::take(map)));
            }
            Some(None)
        }
     }
}).filter_map(|x| x)

khuey avatar Sep 21 '21 16:09 khuey