snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard
Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Background: https://discourse.snowplowanalytics.com/t/rdbloader-postgresql-error/2059/5 We're downloading data to Loader's node anyway, so we can extract data on-fly.
From my experience, enriched data assumes that raw data was not just enriched, but also validated - we never add invalid contexts/unstruct events to final enriched event. Yet validation is...
It'd be cool to add some context to the error reported when loading data, e.g.: >Data loading error [Amazon](500310) Invalid operation: Number of jsonpaths and the number of columns should...
Until R29, `--dry-run` would work as expected even without connecting to Redshift - because its execution didn't depend on `atomic.manifest` nor any other tables. However, because of #14 and #70...
It would be nice to have the [ttl](https://git.io/vNKTm) configurable in order to adapt DynamoDB usage/cost.
Migrated from https://github.com/snowplow/snowplow/issues/2217
See https://discourse.snowplowanalytics.com/t/redshift-loading-error-null-byte-field-longer-than-1-byte/917 for details.
[S3Guard](https://hortonworks.com/blog/s3guard-amazon-s3-consistency/) is an experimental s3a-plugin that uses DynamoDB as an intermediate metadata store. Right now this is part of Hadoop Common 2.9 and should be considered unstable. Also I'm not...
Cannot promise this will be implemented, @acgray, but what would your implementation look like? Is it another setting in `aws.s3.buckets`?