seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

Support metadata column for file source connector

Open Hisoka-X opened this issue 4 months ago • 3 comments

Search before asking

  • [x] I had searched in the feature and found no similar feature requirement.

Description

We should add metadata column for file source. So each seatunnel row from file will contains metadata info too. After this we can get these information for metadata transform.

The metadata should contains:

  • FilePath
  • FileCreateTime
  • FileUpdateTime
  • FileSize
  • FileType

etc.

Please refer https://github.com/apache/seatunnel/pull/9586 and https://github.com/apache/seatunnel/blob/dev/docs/en/transform-v2/metadata.md

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

Hisoka-X avatar Aug 21 '25 04:08 Hisoka-X

@Hisoka-X Does it support image, audio, and video files? If I want to perform embedding on all image (audio, video) files under a certain directory in S3. Can I use s3file source read the files from s3 , then use metadata transform to extract the paths of the image (audio, video) files, finally use them as input for multimodal embedding?(https://github.com/apache/seatunnel/pull/9673)

loupipalien avatar Sep 01 '25 04:09 loupipalien

@Hisoka-X Does it support image, audio, and video files? If I want to perform embedding on all image (audio, video) files under a certain directory in S3. Can I use s3file source read the files from s3 , then use metadata transform to extract the paths of the image (audio, video) files, finally use them as input for multimodal embedding?(#9673)

Of course! This is what we want to do!

Hisoka-X avatar Sep 01 '25 14:09 Hisoka-X

Can I try it out? @davidzollo @Hisoka-X

LiJie20190102 avatar Dec 06 '25 08:12 LiJie20190102

Can I try it out? @davidzollo @Hisoka-X

I'm glad to see that you'd like to implement this feature. By the way, there is no need to ask for assignment. Please just leave a message like I'll solve it and then you can submit a PR within four weeks. If the time is not enough, please leave a message I still need a little more time. these messages can let other contributors know that this feature is under developing.

The community always keeps open and public. ^_^

davidzollo avatar Dec 18 '25 03:12 davidzollo