gcp-ingestion
gcp-ingestion copied to clipboard
Documentation and implementation of telemetry ingestion on Google Cloud Platform
[Job](https://console.cloud.google.com/dataflow/jobsDetail/locations/us-west1/jobs/2019-10-21_11_50_25-1731386368813309696?project=moz-fx-data-beam-prod-11f7&folder) [Stackdriver](https://app.google.stackdriver.com/services/dataflow/moz-fx-data-beam-prod-11f7;us-west1;structured-decoded_bq-sink_9b31004-1af4aee_3;6560670?project=moz-fx-data-ingesti-prod-579d) This appears to only happen with the live bq sinks, and may or may not be related to https://bugzilla.mozilla.org/show_bug.cgi?id=1590559. The error timing corresponds precisely to when the dataflow...
Stack trace: { "attributeMap": { "args": "", "client_id": "n/a", "content_length": "1143", "document_id": "264ae3b7-5746-42c0-b4c7-55dea61f63f3", "document_namespace": "activity-stream", "document_type": "impression-stats", "document_version": "1", "error_message": "com.google.common.util.concurrent.UncheckedExecutionException: com.google.cloud.bigquery.BigQueryException: Remote host closed connection during handshake", "error_type": "KeyByBigQueryTableDestination",...
Opening this up for comments. Currently we input `{}` when there are no `additional_properties`. I propose that it is more intuitive that `additional_properties` is `NULL` when there are no elements...
We have made several changes in the past which have caused Jenkins to fail when attempting to build Dataflow templates. We should add a test to run `bin/template-build` and verify...
In https://github.com/mozilla/gcp-ingestion/pull/676#discussion_r297867953, @relud suggests moving to the following interface for configuring republishing per-namespace (`{"":[""]}`): ``` { "project/telemetry/topic/per-namespace-${document_namespace}":[ "telemetry", "glean" ], "project/pocket/topic/namespace-pocket":[ "pocket" ] } ``` We could even collapse per-doctype...
This was discussed briefly in our Monday meeting with the outcome being to file an issue to hammer out the details. There are a few classes of special cases where...
This includes denial of service attacks against the edge, submission of bogus data, and excessive submission due to bugs. @jasonthomas suggests: > we should talk to foxsec (:ulfr) for [iprepd](https://github.com/mozilla-services/iprepd-nginx/blob/master/README.md)...
See https://github.com/mozilla-services/mozilla-pipeline-schemas/issues/332 for what it would take to update to json schema v7 and to validate "format" specifications.
[RTBHOUSE/avro-fastserde](https://github.com/RTBHOUSE/avro-fastserde) is an alternative data writer and reader that uses just-in-time (JIT) compilation to improve serialization and deserialization of Avro data. The current Avro sink as of commit bfdfd2c runs...
With #517 we have a general mechanism to separate "debug" pings. It would be useful for consumers of this stream to have a corresponding, filtered error topic containing only errors...