swiftide icon indicating copy to clipboard operation
swiftide copied to clipboard

Allow splitting a node into multiple nodes during Pipeline transformation steps

Open jespersm opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. When transforming a node in a pipeline, it is sometimes useful to transform a single node into multiple nodes, e.g. if implemening a sliding window chunking, or by generating multiple embeddings for a single, larger chunk of text.

Describe the solution you'd like There could be a function under Pipeline called then_flatten, flat_map, or similar, taking a MultiTransformer which produces an iterator of Nodes per Node.

Describe alternatives you've considered Alternatively, I have to store the nodes in a temporary store and flatmap it there.

Additional context The existing Transformer could provide a default implementation of MultiTransformer, using iter::once, and we'd only need one Pipeline::then-operation taking a MultiTransformer.

jespersm avatar Oct 07 '24 14:10 jespersm

Hy @jespersm, thanks for opening an issue!

Do you just want to go from one node to many nodes? then_chunk does exactly that. Sliding window / chunking with overlap hasn't been implemented, but I think that's certainly possible when up to the chunker.

If you mean to merge nodes after chunking, the Swiftide pipeline is unordered, so a merge after splitting a node would need some consideration.

timonv avatar Oct 08 '24 13:10 timonv

@jespersm Closing this issue. Feel free to respond and re-open it, I'd love to think along if I misunderstood and provide a solution!

timonv avatar Nov 08 '24 10:11 timonv

Yes, then_chunk did the trick, even if slightly clumsy. Thanks for pointing me in the right direction.

jespersm avatar Nov 08 '24 21:11 jespersm