lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

Object user metadata APIs are inconsistent between S3 gateway and lakeFS APIs

Open arielshaqed opened this issue 6 months ago • 1 comments

The S3 protocol passes user metadata as headers with prefix x-aws-meta-. The AWS S3 SDK strips away these prefixes from the header. So for instance it treats the metadata foo=bar by using headers X-AWS-Meta-foo: bar. (And, as an aside, this means that user metadata ends up being effectively case insensitive!)

The S3 gateway on lakeFS keeps this prefix. So for instance if you use the AWS S3 SDK to set metadata foo=bar, lakeFS ends up setting a metadata key x-aws-meta-foo to the value bar. And because the S3 gateway keeps this prefix, it respond to get-object with headers X-AWS-Meta-foo: bar, and the AWS S3 SDK will correctly show metadata foo=var.

The lakeFS SDK obviously doesn't know about this prefix, at all. So:

  • If you use the AWS S3 SDK to set metadata foo=bar on a lakeFS object, reading it with the lakeFS API will give metadata x-aws-meta-foo=bar.
  • If you use the lakeFS API to set metadata foo=bar on an object, the S3 gateway will ignore this metadata and not return any header. So the AWS S3 SDK will never see this metadata.

Impact

Users who have workloads that use both the S3 gateway and the lakeFS API will see inconsistent and surprising results.

Backwards compatibility

Whatever we do, if we do change this, we will change behaviour of users of the S3 gateway. So we might need to have a feature flag for this.

arielshaqed avatar May 21 '25 08:05 arielshaqed

To address this, and to ensure consistency for users working with both the S3 gateway and the lakeFS API, we propose the following changes:

  • Reading metadata: When returning metadata through either the lakeFS API or the S3 gateway, we will flatten metadata keys by removing the x-aws-meta- prefix. This will ensure consistent key names across both interfaces (e.g., foo=bar instead of x-aws-meta-foo=bar).
  • Writing metadata via the S3 gateway: The S3 gateway will no longer persist metadata keys with the x-aws-meta- prefix. Instead, it will store flattened metadata keys directly (e.g., foo=bar).

This approach maintains backward compatibility in the short term. Previously written metadata with the prefix will still be accessible, while ensuring that newly written metadata is consistent and interoperable across both the API and the gateway.

We will need to document the API change in behavior and communicate this information to users/customers, explaining how the new version may impact responses.

nopcoder avatar May 27 '25 14:05 nopcoder