snowplow-rdb-loader icon indicating copy to clipboard operation
snowplow-rdb-loader copied to clipboard

RDB Shredder: Consider using SUPER type in Redshift

Open miike opened this issue 4 years ago • 1 comments

It's early days for the SUPER type (preview mode, https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-support-native-json-semi-structured-data-processing/) but I think it may possibly be a good fit for semi-structured data that is currently shredded into individual tables and joined back to atomic.events.

There are several potential benefits of doing so:

  • Redshift could support a single table model (similar to BQ and Snowflake)
  • removal of shredding logic entirely from rdb-loader (possibly?)
  • ability to query deeply nested shredded tables without having JSON parse overhead
  • becomes easier to stream data into Redshift in the future (single operation vs single transaction with multiple loads)

miike avatar Jan 20 '21 02:01 miike

Bumping with @chuwy as SUPER is now in general access in almost all commercial regions (and it looks like this is now on the public roadmap as 'later' but worth thinking about now).

miike avatar May 03 '21 01:05 miike