gcs-connector-for-apache-kafka icon indicating copy to clipboard operation
gcs-connector-for-apache-kafka copied to clipboard

feat: Add support for setting object metadata `Content-Encoding`

Open jclarysse opened this issue 1 year ago • 1 comments

Users willing to leverage GCS capability to decompress gzip objects on server-side when accessing them through the Storage API requested the fixed-metadata Content-Encoding (default: null) to become configurable so that its value can be set (ie. to gzip) when the connector uploads a new file to the bucket. https://cloud.google.com/storage/docs/metadata#content-encoding

jclarysse avatar Apr 03 '24 12:04 jclarysse

Wasn't able to run my integration test on my local and here it failed. I'll now go back to my local to hopefully fix it.

jclarysse avatar Apr 04 '24 07:04 jclarysse

@jjaakola-aiven shared that the integration test passed on his local.

jclarysse avatar May 27 '24 07:05 jclarysse

The expected behaviour is that for compressed blobs with metadata Content-Encoding=gzip, the result of object download should be uncompressed. This can be easily verified using GCP sample code downloadFile.js.

Since GCS connector previously only had tests based on object read, I had to add some boilerplate-code to make reading from download possible.

The new test contentEncodingAwareDownload() passes when using parameters compression=none and content-encoding=none. Unfortunately, it fails to decode required fields when using parameterscompression=gzip and content-encoding=gzip as the bytes do not seem to be uncompressed.

java.lang.IllegalArgumentException: Illegal base64 character 1f

I wonder if this is a limitation of Testcontainer's DatastoreEmulator.

jclarysse avatar Jun 03 '24 08:06 jclarysse

@jjaakola-aiven Thanks for your help with fixing the test so that both compression and encoding work as expected. I pushed again using your patch. Please review.

jclarysse avatar Jun 03 '24 13:06 jclarysse