streamly icon indicating copy to clipboard operation
streamly copied to clipboard

Evaluate implementation of array backed streams

Open harendra-kumar opened this issue 5 months ago • 0 comments

Currently the Stream type does not support scalable cons or append, we need to use StreamK for that. It may be possible to do that if the stream representation has an option to be backed by arrays or lists of opaque step functions. Along with the static step and state a Stream can have lists of arrays for storage of already generated items. The head and tail of the stream can have a cons list and snoc list of arrays for cons and snoc. The middle part of the stream could consist of a static step and state for fusion.

We can support faster indexing into stream because we have an array representation now. If the index falls after the head or before the tail we will have to evaluate the stream up to that part and store it in an array. When the cons list at head and the step function is over we can transfer the snoc list at the tail to the cons list at the head and continue appending. To implement O(1) indexing we will need fixed size array chunks and the count of arrays.

For appending of streams we will need lists of Streams in addition to arrays. When consing or snocing a stream - the head/tail list of arrays can be wrapped into a Stream and consed or snoced to the list of streams and the new Stream can then be consed/snoced to it. This may potentially block fusion though?

Instead of lists we can also use an effect to generate a stream e.g. m (Stream m a), that will keep it generic.

If this does not have fusion and performance issues then it can potentially replace StreamK and Arrays (including stream of arrays), simplifying the API significantly. For representing mutable arrays we can have a MutStream variant. Also we may need boxed and unboxed versions to represent unboxed streams more efficiently. We can also choose pinned/unpinned memory for the arrays backing the stream. This representation can also lend itself to SIMD optimizations. Though this will make the current implementation fairly complex.

harendra-kumar avatar Jul 31 '25 08:07 harendra-kumar