rig icon indicating copy to clipboard operation
rig copied to clipboard

feat: loader improvements (chunking, pipeline integration)

Open 0xMochan opened this issue 2 months ago • 0 comments

  • [x] I have looked for existing issues (including closed) about this

Feature Request

Loaders could do with some more features to create a more complete experience.

  • [ ] Add a more straightforward chunking experience (including multiple strategies, isolated + overlapping)
  • [ ] Add pipeline integration by implementing a new method or the Op and TryOp traits

Motivation

Loaders first implementations are straightforward, we should continue to grow the aspect of incorporating knowledge for the utility of RAGs by improving our interface.

Proposal

  • [ ] Add .chunk with various options to the loaders
  • [ ] Add Op and/or TryOp trait implementations

Alternatives

  • Add chunking as apart of the pipeline system instead
    • This is a fairly interesting thought as chunking is a generic way of breaking of text. This could also apply to length LLM output streams which is natural from a pipeline POV. However, it's mostly used for loaders and the context restraint being only placed on loaders makes for seamless helpers as anyone could just use iterators on their own to create custom chunking if needed.
  • Create iterator helper methods directly rather than tying it to the context of loaders.
    • Also another approach which would increase it's flexibility. It would exist more as a loaders helper / utils which would be leveraged in common use-cases for chunking strategies (perhaps the loaders implementation just wraps the internal iterator extenders).

Notes

Loaders share some sense of ideology but the different loaders do not share any traits or implementations so it's very easy to duplicate code. Might want to investigate how some logic could be intelligently shared.

0xMochan avatar Dec 19 '24 21:12 0xMochan