seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Feature] Connector prepare for RAG

Open Hisoka-X opened this issue 4 months ago • 3 comments

Search before asking

  • [x] I had searched in the feature and found no similar feature requirement.

Description

As a multimodal data integration tool, we hope that SeaTunnel can support parsing complex file types, converting their contents into structured file streams, and ultimately writing them into a vector library through embedding. This issue tracks related tasks.

For chunking please refer Please refer https://docs.dify.ai/en/guides/knowledge-base/create-knowledge-and-upload-documents/chunking-and-cleaning-text and https://docs.llamaindex.ai/en/stable/examples/node_parsers/semantic_chunking/

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

Hisoka-X avatar Aug 18 '25 08:08 Hisoka-X

@Hisoka-X

Before working on this, I think it’s important to have a discussion about abstraction first.
Is it okay for the person who originally created this to just go ahead and handle the abstraction work as well?

joonseolee avatar Aug 19 '25 12:08 joonseolee

@Hisoka-X

Is it alright if I collaborate with @joonseolee on the abstraction task and also take on tickets 1 through 3 together?

iinow avatar Aug 19 '25 12:08 iinow

@Hisoka-X

Is it alright if I collaborate with @joonseolee on the abstraction task and also take on tickets 1 through 3 together?

Sure! Thanks @joonseolee @iinow

Hisoka-X avatar Aug 19 '25 13:08 Hisoka-X