snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard
RDB Shredder: Consider using SUPER type in Redshift
It's early days for the SUPER type (preview mode, https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-support-native-json-semi-structured-data-processing/) but I think it may possibly be a good fit for semi-structured data that is currently shredded into individual tables and joined back to atomic.events
.
There are several potential benefits of doing so:
- Redshift could support a single table model (similar to BQ and Snowflake)
- removal of shredding logic entirely from rdb-loader (possibly?)
- ability to query deeply nested shredded tables without having JSON parse overhead
- becomes easier to stream data into Redshift in the future (single operation vs single transaction with multiple loads)
Bumping with @chuwy as SUPER is now in general access in almost all commercial regions (and it looks like this is now on the public roadmap as 'later' but worth thinking about now).