snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard
Stores Snowplow enriched events in Redshift, Snowflake and Databricks
We have a very common kind of an error caused by schemaing mistake that results in std load error. What we usually do is either: 1. Notifying the owner asking...
If Redshift is faulty state it would make sense to stop receiving messages. Otherwise we need to resend many messages that were acked, but failed being loaded.
It's early days for the SUPER type (preview mode, https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-support-native-json-semi-structured-data-processing/) but I think it may possibly be a good fit for semi-structured data that is currently shredded into individual tables...
Redshift/postgresql has massive amounts of knobs to turn for optimizing query performance and one recommended by AWS is increasing `wlm_query_slot_count` in a session before running a query that needs extra...
Background: https://groups.google.com/forum/#!topic/snowplow-user/0Vi2bhfXDPQ When we have nested shredding of array types, unless we have an array_index field in shredded tables, then our shredding process will inevitably be lossy: there will be...
Currently we use the collector_tstamp for the root_tstamp value. It would be preferable to use the derived_tstamp once all our client side trackers support generating a dvce_sent_tstamp. (Because that point...
e.g. for `atomic.events` This would make it easier to delete data that has expired without having to run expensive `VACUUM` operations
As we have in Snowflake Loader (https://github.com/snowplow-incubator/snowplow-snowflake-loader/blob/master/loader/src/main/resources/sql/atomic-def.sql), but come up with a more discoverable location.
In #232 we moved entirely to SQS discovery, but left some functionality related to discovering data on S3, mostly in `ShreddedType` modeul. I think that it will be helpful later...