materialize icon indicating copy to clipboard operation
materialize copied to clipboard

persist: blobs in S3 larger than expected

Open pH14 opened this issue 1 year ago • 0 comments

When investigating the Devex environment a few weeks ago, we had many blobs in S3 that were well beyond the expected 128MiB target. There were 10+ at 300MiB, and a few dozen 200MiB+.

I haven't dug deep into this, but my guess is this is an artifact of how we calculate columnar part sizes here: https://github.com/MaterializeInc/materialize/blob/cb9e7cf31648a5f18fd5eab77546ea3d9cce1c9f/src/persist/src/indexed/columnar.rs#L446-L454

Since the 128MiB limit is applied separately to keys and values, it seems like we could blow past 128MiB if we insert a lot of updates with similarly sized keys and values. e.g. if we had updates where the keys are 32 bytes, values are 32 bytes and 16 bytes are used for T and D, we'd arrive at 330+MB before the limit kicks in.

For better or for worse, since our last breaking change, we don't have any outliers to examine on prod right now. The top 10 largest blobs can be found via:

aws s3api list-objects-v2 --bucket <prod bucket> --query "sort_by(Contents, &Size)[-10:]" --profile <aws profile>

pH14 avatar Sep 01 '22 20:09 pH14