swiftide
swiftide copied to clipboard
Allow splitting a node into multiple nodes during Pipeline transformation steps
Is your feature request related to a problem? Please describe. When transforming a node in a pipeline, it is sometimes useful to transform a single node into multiple nodes, e.g. if implemening a sliding window chunking, or by generating multiple embeddings for a single, larger chunk of text.
Describe the solution you'd like
There could be a function under Pipeline called then_flatten, flat_map, or similar, taking a MultiTransformer which produces an iterator of Nodes per Node.
Describe alternatives you've considered Alternatively, I have to store the nodes in a temporary store and flatmap it there.
Additional context
The existing Transformer could provide a default implementation of MultiTransformer, using iter::once, and we'd only need one Pipeline::then-operation taking a MultiTransformer.
Hy @jespersm, thanks for opening an issue!
Do you just want to go from one node to many nodes? then_chunk does exactly that. Sliding window / chunking with overlap hasn't been implemented, but I think that's certainly possible when up to the chunker.
If you mean to merge nodes after chunking, the Swiftide pipeline is unordered, so a merge after splitting a node would need some consideration.
@jespersm Closing this issue. Feel free to respond and re-open it, I'd love to think along if I misunderstood and provide a solution!
Yes, then_chunk did the trick, even if slightly clumsy. Thanks for pointing me in the right direction.