snowplow-rdb-loader icon indicating copy to clipboard operation
snowplow-rdb-loader copied to clipboard

Explore using time-series tables in Redshift

Open yalisassoon opened this issue 8 years ago • 2 comments

e.g. for atomic.events

This would make it easier to delete data that has expired without having to run expensive VACUUM operations

yalisassoon avatar Mar 25 '16 06:03 yalisassoon

This is super-interesting! Related tickets: snowplow/snowplow#2457, snowplow/snowplow#953

Some open questions:

  • Do we just do atomic.events, or also all shredded tables, or some shredded tables?
  • Day tables, week tables or month tables? Probably configurable
  • Redshift has a limit of 9,900 tables per cluster (Redshift limits) to factor in
  • Do we partition based on etl_tstamp or derived_tstamp? If the latter, then any given load could obviously be loading into many many tables (because a given load can have derived timestamps from multiple days)
  • Note that Redshift doesn't have an equivalent of BigQuery's table wildcard functions, but we could add this as a pre-processor in a Snowplow SQL Analytics SDK

alexanderdean avatar Mar 25 '16 12:03 alexanderdean

Moving to RDB Loader repo

alexanderdean avatar Apr 28 '21 16:04 alexanderdean