snowplow-rdb-loader Explore using time-series tables in Redshift

Explore using time-series tables in Redshift

Open yalisassoon opened this issue 8 years ago • 2 comments

e.g. for atomic.events

This would make it easier to delete data that has expired without having to run expensive VACUUM operations

Mar 25 '16 06:03 yalisassoon

This is super-interesting! Related tickets: snowplow/snowplow#2457, snowplow/snowplow#953

Some open questions:

Do we just do atomic.events, or also all shredded tables, or some shredded tables?
Day tables, week tables or month tables? Probably configurable
Redshift has a limit of 9,900 tables per cluster (Redshift limits) to factor in
Do we partition based on etl_tstamp or derived_tstamp? If the latter, then any given load could obviously be loading into many many tables (because a given load can have derived timestamps from multiple days)
Note that Redshift doesn't have an equivalent of BigQuery's table wildcard functions, but we could add this as a pre-processor in a Snowplow SQL Analytics SDK

Mar 25 '16 12:03 alexanderdean

Moving to RDB Loader repo

Apr 28 '21 16:04 alexanderdean