connectors icon indicating copy to clipboard operation
connectors copied to clipboard

S3 connector: Support fetching of metadata from the S3 bucket

Open ppf2 opened this issue 1 year ago • 11 comments

When we call Bucket.objects in our connector, it will be great to also explicitly ask for the metadata attribute and any other attributes they require from the object and add that to the final document to be indexed.

Acceptance criteria

  • [ ] add optional RCF with comma separated values to be listed e.g. Content-Type, Last-Modified
  • [ ] the defined metadata fields are ingested for all objects that have them
  • [ ] update S3 official documentation

ppf2 avatar Jul 12 '23 18:07 ppf2

@ppf2 Sure we will add metadata attribute to s3 connector, can you please provide the list of attribute you want to add in s3 to index into elastic search?

akanshi-elastic avatar Jul 13 '23 06:07 akanshi-elastic

@akanshi-elastic I think it will be great if we can make it flexible and just produce an output object for the entire map of metadata that is present on the bucket because different users may want to extract different attributes.

ppf2 avatar Jul 13 '23 17:07 ppf2

Making this configurable would likely provide the best user value. Adding a RCF where this can be specified would be useful.

danajuratoni avatar Jul 14 '23 08:07 danajuratoni

@danajuratoni can you please elaborate more on this what do you mean by Adding a RCF where this can be specified would be useful.?

akanshi-elastic avatar Jul 18 '23 12:07 akanshi-elastic

We should define a new optional rich configurable field where comma separated values can be listed e.g. Content-Type, Last-Modified and these metadata fields should be ingested for all objects that have them.

danajuratoni avatar Jul 20 '23 14:07 danajuratoni

image

Here is a sample of metadata attributes that are returned from the get object call. I can certainly see users interested in attributes like last modified, content length, content type and any custom metadata S3 bucket allow users to set. Certainly, a flexible solution that allows users to indicate what list of attributes they want to fetch would be nice.

ppf2 avatar Jul 20 '23 23:07 ppf2

Let's document which Response Structure is used for defining the keys in the comma separated RCF, with an official link to the documentation as well. cc: @leemthompo - this should be part of the 8.10 release notes

danajuratoni avatar Jul 24 '23 13:07 danajuratoni

waiting for the update on how metadata will be structured in elastic.

akanshi-elastic avatar Aug 03 '23 05:08 akanshi-elastic

@akanshi-elastic has this update been provided?

leemthompo avatar Aug 28 '23 13:08 leemthompo

This issue is on hold from our side. It is in internal discussion from dana's side. Once @danajuratoni provide confirmation on this, we'll start working on it .

akanshi-elastic avatar Sep 04 '23 07:09 akanshi-elastic

This issue was deprioritised atm, keeping this as an enhancement in backlog.

danajuratoni avatar Sep 26 '23 08:09 danajuratoni