snowplow-rdb-loader issues

RDB Shredder: don't replace newlines and tabs with spaces

In R34 (and in R32 previously) we've [made a decision](https://github.com/snowplow/snowplow-rdb-loader/issues/238) to replace special symbols (such as newlines and tabs that can break TSV structure) with spaces, but @benjben [made a...

chuwy

RDB Loader: remove VACUUM from transaction block

2

When running emr-etl-runner with the `-i vacuum` option I get the following error ``` Data loading error [Amazon](500310) Invalid operation: VACUUM cannot run inside a transaction block; ``` Redshift Cluster...

abrenaut

bug

RDB Loader: add back-off to consistency check

3

Right now delay is always `((atomicFiles.length * 0.1 * shreddedTypes.length) + 5) seconds` and Loader keeps invoke `Thread.sleep` until state is consistent or check happened 5 times. For 500 `atomic-events`...

chuwy

RDB Shredder: add option to ignore shredded types

3

Source: https://github.com/snowplow/snowplow/pull/2142 Not sure it'll ever go to master, but leaving for further exploration.

chuwy

RDB Loader: build Docker image

chuwy

RDB Loader: do not mask JDBC username in Redshift output

Currently we mask username and password, but username is not private info and usually just `snowplow`

chuwy

RDB Shredder: validate that JSONs are conforming to Redshift limits

10

Given that only RDB Loader has knowledge of the targeted database, it makes sense that it enforces the database limits (e.g. 4mb for JSONs in Redshift).

BenFradet

RDB Loader: explore expired token

3

Recently we had a loader job running for 9 hours (`ANALYZE` most likely). After all steps successfully completed, RDB Loader tried to dump log to S3 and failed with following...

chuwy

stability

RDB Loader: make consistency check folder-based

1

Currently we're checking consistency by comparing list of *files* between checks, but (probably) it is too strict check, because in the end we're loading data using pattern `s3://shredded/good/com.acme/shredded-context/jsonschema/1-0-0/part-*`, which means...

chuwy

Common: simplify configuration

10

Migrated from https://github.com/snowplow/snowplow/issues/3279

chuwy

snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard

Metadata

RDB Shredder: don't replace newlines and tabs with spaces

RDB Loader: remove VACUUM from transaction block

RDB Loader: add back-off to consistency check

RDB Shredder: add option to ignore shredded types

RDB Loader: build Docker image

RDB Loader: do not mask JDBC username in Redshift output

RDB Shredder: validate that JSONs are conforming to Redshift limits

RDB Loader: explore expired token

RDB Loader: make consistency check folder-based

Common: simplify configuration

← Metadata

Owner

Metadata

snowplow-rdb-loader snowplow-rdb-loader copied to clipboard

Metadata

← Metadata

Owner

Metadata

snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard