tremor-runtime icon indicating copy to clipboard operation
tremor-runtime copied to clipboard

Support GCS resumable uploads

Open mfelsche opened this issue 4 years ago • 4 comments
trafficstars

Describe the problem you are trying to solve

When backing up events to GCS it is common practice to store files that contain data aggregated by some unit of time, be it day or hour. A common object name format is something like, which is also supported by big data tools like hive or spark:

some_prefix/year=2021/month=6/day=9/hour=11/data.json

or similar schemes.

To upload events to a GCS object we'd need to use the following offramp command:

https://docs.tremor.rs/artefacts/offramps/#upload_object

the "normal" GCS upload_object API requires the whole object to be uploaded with 1 request. To support aggregated files, this would require to buffer all events inside a tumbling window or a batch, which is dangerous as it could blow up memory depending on the volume of events for the aggregation period.

The problem we are trying to solve is to stream data to GCS objects in the aforementioned format without locally buffering the whole object.

Describe the solution you'd like

Luckily the GCS API supports resumable uploads where data is distributed across multiple requests:

https://cloud.google.com/storage/docs/resumable-uploads

We would like to support resumable uploads in the GCS offramp. For this we would need to add a marker to an upload_object command that this is the last upload for this object. For cases where the size is not known (e.g. time wise aggregation where the name of the object changes because of a timestamp) on the last payload, we could also add such a marker on the next payload.

If possible such uploads should work across tremor restarts, so that necessary state kept around in the offramp for resumable uploads needs to be persisted.

mfelsche avatar Jun 09 '21 09:06 mfelsche

cc @jigyasak05 FYI 😎

mfelsche avatar Jun 09 '21 09:06 mfelsche

Nice, I can look into this! :D

jigyasak05 avatar Jun 09 '21 11:06 jigyasak05

Though I am happy that you wanna look into this, dont feel like you have to. I just wanted to show you this issue as you're author of this offramp. This can get hairy to test and stuff. Are you sure? ;-)

mfelsche avatar Jun 09 '21 11:06 mfelsche

Haha! I can try :-D

jigyasak05 avatar Jun 09 '21 12:06 jigyasak05