snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard
Stores Snowplow enriched events in Redshift, Snowflake and Databricks
In R34 (and in R32 previously) we've [made a decision](https://github.com/snowplow/snowplow-rdb-loader/issues/238) to replace special symbols (such as newlines and tabs that can break TSV structure) with spaces, but @benjben [made a...
When running emr-etl-runner with the `-i vacuum` option I get the following error ``` Data loading error [Amazon](500310) Invalid operation: VACUUM cannot run inside a transaction block; ``` Redshift Cluster...
Right now delay is always `((atomicFiles.length * 0.1 * shreddedTypes.length) + 5) seconds` and Loader keeps invoke `Thread.sleep` until state is consistent or check happened 5 times. For 500 `atomic-events`...
Source: https://github.com/snowplow/snowplow/pull/2142 Not sure it'll ever go to master, but leaving for further exploration.
Currently we mask username and password, but username is not private info and usually just `snowplow`
Given that only RDB Loader has knowledge of the targeted database, it makes sense that it enforces the database limits (e.g. 4mb for JSONs in Redshift).
Recently we had a loader job running for 9 hours (`ANALYZE` most likely). After all steps successfully completed, RDB Loader tried to dump log to S3 and failed with following...
Currently we're checking consistency by comparing list of *files* between checks, but (probably) it is too strict check, because in the end we're loading data using pattern `s3://shredded/good/com.acme/shredded-context/jsonschema/1-0-0/part-*`, which means...
Migrated from https://github.com/snowplow/snowplow/issues/3279