si icon indicating copy to clipboard operation
si copied to clipboard

[WIP] feat: add S3 to layerdb

Open sprutton1 opened this issue 5 months ago • 27 comments

This adds S3 as a layer in the layer cache, primarily as an eventual replacement for Postgres. This implementation adds it as an intermediate layer between the disk and pg, so we will continue to fall back to postgres if we don't find what we want in S3. If we fall back to pg, we will populate S3 with the found value.

Once we have enough evidence that this approach is viable, we can migrate the remaining snapshots that have not been accessed from pg to S3 and then disable postgres entirely. I suspect that this will slow us down a bit at first as we're going to be writing even more, but we should be able to see how the S3-specific bits perform.

This implementation uses the address of the item as the prefix in S3, so the existing change_set_pointers table will remain valid. If the performance of this layout is insufficient, we can group keys with more specific prefixes to allow S3 to shard appropriately.

note: due to the size of this PR, I am pushing off moving func_runs as they'll need a higher-touch implementation.

sprutton1 avatar Sep 27 '24 17:09 sprutton1