seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

Support text splitter transform (Chunking).

Open Hisoka-X opened this issue 4 months ago • 8 comments

Hisoka-X avatar Aug 18 '25 08:08 Hisoka-X

@Hisoka-X

Can I add a TextTransform class to the seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transforms/text package for this feature?

iinow avatar Sep 06 '25 07:09 iinow

@Hisoka-X

Can I add a TextTransform class to the seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transforms/text package for this feature?

That's right!

Hisoka-X avatar Sep 06 '25 09:09 Hisoka-X

@Hisoka-X Additionally, Is it acceptable to extend MultipleFieldOutputTransform, implement protected Column[] getOutputColumns() with only one "chunk" column, and receive the chunk_size option via the constructor for this text chunking transform?

iinow avatar Sep 07 '25 14:09 iinow

Is it acceptable to extend MultipleFieldOutputTransform, implement protected Column[] getOutputColumns()

This is a normal and right way to extend new transform.

only one "chunk" column, and receive the chunk_size option via the constructor

Could you use https://docs.dify.ai/en/guides/knowledge-base/create-knowledge-and-upload-documents/chunking-and-cleaning-text#general-mode as a guide? We should support general mode in our first version.

Hisoka-X avatar Sep 08 '25 15:09 Hisoka-X

@Hisoka-X Okay thanks, Can I take this issue?

iinow avatar Sep 14 '25 11:09 iinow

@iinow Is there any progress ?

davidzollo avatar Dec 04 '25 03:12 davidzollo

@davidzollo I’m afraid I won’t be able to take on the task as I initially planned. I’m really sorry for the inconvenience

iinow avatar Dec 06 '25 21:12 iinow

@davidzollo I’m afraid I won’t be able to take on the task as I initially planned. I’m really sorry for the inconvenience

Don't worry, Delays do sometimes happen. Can you complete it some days later?

davidzollo avatar Dec 11 '25 09:12 davidzollo