snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard
Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Migrated from https://github.com/snowplow/snowplow/issues/3278
Migrated from https://github.com/snowplow/snowplow/issues/1838 Cannot confirm yet that bug exists, because I think I successfully used periods in bucket names. Also internals changed significantly since original issue, so likely it isn't...
Migrated from https://github.com/snowplow/snowplow/issues/3445#issuecomment-333064293 Right now we're generating random UUID, which makes all natural duplicates synthetic. We should throw exception and abort shredding instead.
The current logic for natural deduplication does not guarantee that we always preserve the earliest event from a batch of duplicates: https://github.com/snowplow/snowplow-rdb-loader/blob/master/shredder/src/main/scala/com.snowplowanalytics.snowplow.storage/spark/ShredJob.scala#L415-L416 . This can lead to confusing outcomes. Natural...
One of our users mistakenly sent a `com.snowplowanalytics.snowplow/contexts` payload and Loader failed with following error: ``` Data discovery error with following issues: JSONPath file [com.snowplowanalytics.snowplow/contexts_1.json] was not found ``` Although,...
migrated from: https://github.com/snowplow/snowplow/issues/3141
migrated from snowplow/snowplow#2451
We have a chance of race condition, breaking the load when two pipelines are involved. With current default behavior: 1. Two pipelines *Big* and *Small* are loading data to same...
We encountered a case, where user has deleted all data from a single table. During usual load process Loader always checks if `atomic` data is present and aborts if it...