connectors
connectors copied to clipboard
S3 connector: Support fetching of metadata from the S3 bucket
When we call Bucket.objects in our connector, it will be great to also explicitly ask for the metadata attribute and any other attributes they require from the object and add that to the final document to be indexed.
Acceptance criteria
- [ ] add optional RCF with comma separated values to be listed e.g.
Content-Type, Last-Modified
- [ ] the defined metadata fields are ingested for all objects that have them
- [ ] update S3 official documentation
@ppf2 Sure we will add metadata attribute to s3 connector, can you please provide the list of attribute you want to add in s3 to index into elastic search?
@akanshi-elastic I think it will be great if we can make it flexible and just produce an output object for the entire map of metadata that is present on the bucket because different users may want to extract different attributes.
Making this configurable would likely provide the best user value. Adding a RCF where this can be specified would be useful.
@danajuratoni can you please elaborate more on this what do you mean by Adding a RCF where this can be specified would be useful.
?
We should define a new optional rich configurable field where comma separated values can be listed e.g. Content-Type, Last-Modified
and these metadata fields should be ingested for all objects that have them.
Here is a sample of metadata attributes that are returned from the get object call. I can certainly see users interested in attributes like last modified, content length, content type and any custom metadata S3 bucket allow users to set. Certainly, a flexible solution that allows users to indicate what list of attributes they want to fetch would be nice.
Let's document which Response Structure is used for defining the keys in the comma separated RCF, with an official link to the documentation as well. cc: @leemthompo - this should be part of the 8.10 release notes
waiting for the update on how metadata will be structured in elastic.
@akanshi-elastic has this update been provided?
This issue is on hold from our side. It is in internal discussion from dana's side. Once @danajuratoni provide confirmation on this, we'll start working on it .
This issue was deprioritised atm, keeping this as an enhancement in backlog.