snowplow-rdb-loader issues

Move steps from CLI arguments to storage target config

1

Migrated from https://github.com/snowplow/snowplow/issues/3278

RDB Loader: fix failure for bucket names with periods

1

Migrated from https://github.com/snowplow/snowplow/issues/1838 Cannot confirm yet that bug exists, because I think I successfully used periods in bucket names. Also internals changed significantly since original issue, so likely it isn't...

chuwy

bug

RDB Shredder: make event_fingerprint mandatory

6

Migrated from https://github.com/snowplow/snowplow/issues/3445#issuecomment-333064293 Right now we're generating random UUID, which makes all natural duplicates synthetic. We should throw exception and abort shredding instead.

chuwy

RDB Loader: ensure we get earliest event when deduplicating

The current logic for natural deduplication does not guarantee that we always preserve the earliest event from a batch of duplicates: https://github.com/snowplow/snowplow-rdb-loader/blob/master/shredder/src/main/scala/com.snowplowanalytics.snowplow.storage/spark/ShredJob.scala#L415-L416 . This can lead to confusing outcomes. Natural...

dilyand

RDB Shredder: explore a way to reject pipeline's auxiliary payloads

1

One of our users mistakenly sent a `com.snowplowanalytics.snowplow/contexts` payload and Loader failed with following error: ``` Data discovery error with following issues: JSONPath file [com.snowplowanalytics.snowplow/contexts_1.json] was not found ``` Although,...

chuwy

RDB Shredder: widen geo_region to 3 characters

cf snowplow/snowplow#3822

BenFradet

low hanging fruit

snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard

Metadata

Move steps from CLI arguments to storage target config

RDB Loader: fix failure for bucket names with periods

RDB Shredder: make event_fingerprint mandatory

RDB Loader: ensure we get earliest event when deduplicating

RDB Shredder: explore a way to reject pipeline's auxiliary payloads

RDB Shredder: widen geo_region to 3 characters

RDB Shredder: make ttl of event in duplicates storage configurable

RDB Shredder: add support for shredding [ {objectA}, {objectA}, ...]

RDB Loader: add possibility to enable transit load by default

RDB Loader: add option to load only shredded data

← Metadata

Owner

Metadata

snowplow-rdb-loader snowplow-rdb-loader copied to clipboard

Metadata

← Metadata

Owner

Metadata

snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard