snowplow-rdb-loader issues

RDB Loader: automatic maxError correction

3

We have a very common kind of an error caused by schemaing mistake that results in std load error. What we usually do is either: 1. Notifying the owner asking...

chuwy

Loader: add commands for pausing SQS pulling

3

If Redshift is faulty state it would make sense to stop receiving messages. Otherwise we need to resend many messages that were acked, but failed being loaded.

chuwy

RDB Shredder: Consider using SUPER type in Redshift

1

It's early days for the SUPER type (preview mode, https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-redshift-announces-support-native-json-semi-structured-data-processing/) but I think it may possibly be a good fit for semi-structured data that is currently shredded into individual tables...

miike

Allow for setting wlm_query_slot_count for Redshift imports

1

Redshift/postgresql has massive amounts of knobs to turn for optimizing query performance and one recommended by AWS is increasing `wlm_query_slot_count` in a session before running a query that needs extra...

smugryan

Consider adding an array_index field into shredded tables

Background: https://groups.google.com/forum/#!topic/snowplow-user/0Vi2bhfXDPQ When we have nested shredding of array types, unless we have an array_index field in shredded tables, then our shredding process will inevitably be lossy: there will be...

alexanderdean

Use derived_tstamp as the primary tstamp in Redshift

5

Currently we use the collector_tstamp for the root_tstamp value. It would be preferable to use the derived_tstamp once all our client side trackers support generating a dvce_sent_tstamp. (Because that point...

yalisassoon

Explore using time-series tables in Redshift

2

e.g. for `atomic.events` This would make it easier to delete data that has expired without having to run expensive `VACUUM` operations

yalisassoon

Common: add events table DDL

As we have in Snowflake Loader (https://github.com/snowplow-incubator/snowplow-snowflake-loader/blob/master/loader/src/main/resources/sql/atomic-def.sql), but come up with a more discoverable location.

chuwy

RDB Loader: handle TERM signal properly

Right now Loader doesn't react at all.

chuwy

bug

Common: consider deleting functionality related to S3-discovery

2

In #232 we moved entirely to SQS discovery, but left some functionality related to discovering data on S3, mostly in `ShreddedType` modeul. I think that it will be helpful later...

chuwy

snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard

Metadata

RDB Loader: automatic maxError correction

Loader: add commands for pausing SQS pulling

RDB Shredder: Consider using SUPER type in Redshift

Allow for setting wlm_query_slot_count for Redshift imports

Consider adding an array_index field into shredded tables

Use derived_tstamp as the primary tstamp in Redshift

Explore using time-series tables in Redshift

Common: add events table DDL

RDB Loader: handle TERM signal properly

Common: consider deleting functionality related to S3-discovery

← Metadata

Owner

Metadata

snowplow-rdb-loader snowplow-rdb-loader copied to clipboard

Metadata

← Metadata

Owner

Metadata

snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard