snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard

Published 20 hours ago •

→

Metadata

Stores Snowplow enriched events in Redshift, Snowflake and Databricks

Reame
Issues

Results 88 snowplow-rdb-loader issues

Sort by recently updated

RDB Loader: support gzip output compression for Postgres

Background: https://discourse.snowplowanalytics.com/t/rdbloader-postgresql-error/2059/5 We're downloading data to Loader's node anyway, so we can extract data on-fly.

RDB Shredder: consider disabling validation against JSON Schema

5

comment

From my experience, enriched data assumes that raw data was not just enriched, but also validated - we never add invalid contexts/unstruct events to final enriched event. Yet validation is...

RDB Loader: enrich error with table name

1

comment

It'd be cool to add some context to the error reported when loading data, e.g.: >Data loading error [Amazon](500310) Invalid operation: Number of jsonpaths and the number of columns should...

RDB Loader: make dry run aware of load manifest

Until R29, `--dry-run` would work as expected even without connecting to Redshift - because its execution didn't depend on `atomic.manifest` nor any other tables. However, because of #14 and #70...

RDB Shredder: make ttl in DuplicateStorage configurable

1

comment

It would be nice to have the [ttl](https://git.io/vNKTm) configurable in order to adapt DynamoDB usage/cost.

RDB Loader: use prepared statements

Migrated from https://github.com/snowplow/snowplow/issues/2217

RDB Shredder: sanitize output by removing nulls from properties

1

comment

See https://discourse.snowplowanalytics.com/t/redshift-loading-error-null-byte-field-longer-than-1-byte/917 for details.

Common: explore S3Guard to improve S3 consistency

[S3Guard](https://hortonworks.com/blog/s3guard-amazon-s3-consistency/) is an experimental s3a-plugin that uses DynamoDB as an intermediate metadata store. Right now this is part of Hadoop Common 2.9 and should be considered unstable. Also I'm not...

RDB Loader: add integration tests to Redshift

Add support for Snowplow asset buckets mirroring

2

comment

Cannot promise this will be implemented, @acgray, but what would your implementation look like? Is it another setting in `aws.s3.buckets`?

‹
1
2
3
4
5
6
7
8
9
›

About

Stores Snowplow enriched events in Redshift, Snowflake and Databricks

spark

scala

redshift

snowplow

31

Stars

16

Forks

Watchers

Owner

← Metadata

31

Stars

16

Forks

Watchers

Owner

Metadata

Stores Snowplow enriched events in Redshift, Snowflake and Databricks