Anton Parkhomenko issues

Results 165 issues of


Anton Parkhomenko

RDB Shredder: make event_fingerprint mandatory

Migrated from https://github.com/snowplow/snowplow/issues/3445#issuecomment-333064293 Right now we're generating random UUID, which makes all natural duplicates synthetic. We should throw exception and abort shredding instead.

RDB Shredder: explore a way to reject pipeline's auxiliary payloads

One of our users mistakenly sent a `com.snowplowanalytics.snowplow/contexts` payload and Loader failed with following error: ``` Data discovery error with following issues: JSONPath file [com.snowplowanalytics.snowplow/contexts_1.json] was not found ``` Although,...

RDB Loader: add possibility to enable transit load by default

We have a chance of race condition, breaking the load when two pipelines are involved. With current default behavior: 1. Two pipelines *Big* and *Small* are loading data to same...

RDB Loader: add option to load only shredded data

We encountered a case, where user has deleted all data from a single table. During usual load process Loader always checks if `atomic` data is present and aborts if it...

RDB Loader: support gzip output compression for Postgres

Background: https://discourse.snowplowanalytics.com/t/rdbloader-postgresql-error/2059/5 We're downloading data to Loader's node anyway, so we can extract data on-fly.

RDB Shredder: consider disabling validation against JSON Schema

From my experience, enriched data assumes that raw data was not just enriched, but also validated - we never add invalid contexts/unstruct events to final enriched event. Yet validation is...

RDB Loader: make dry run aware of load manifest

Until R29, `--dry-run` would work as expected even without connecting to Redshift - because its execution didn't depend on `atomic.manifest` nor any other tables. However, because of #14 and #70...

RDB Loader: use prepared statements

Migrated from https://github.com/snowplow/snowplow/issues/2217

Common: explore S3Guard to improve S3 consistency

[S3Guard](https://hortonworks.com/blog/s3guard-amazon-s3-consistency/) is an experimental s3a-plugin that uses DynamoDB as an intermediate metadata store. Right now this is part of Hadoop Common 2.9 and should be considered unstable. Also I'm not...

Add support for Snowplow asset buckets mirroring

Cannot promise this will be implemented, @acgray, but what would your implementation look like? Is it another setting in `aws.s3.buckets`?