gcp-ingestion
gcp-ingestion copied to clipboard
Documentation and implementation of telemetry ingestion on Google Cloud Platform
[event](https://console.cloud.google.com/monitoring/alerting/incidents/0.m4pb03h7g6io?project=moz-fx-data-ingesti-prod-579d) I presently have no idea what caused this. Ingestion traffic around the time looked normal, and the recent edge deployment / schemas updates completed over 24h before this event....
Impression-stats and other docTypes have a `release` top-level field that should be used in the pipeline as input to normalized_channel. Currently, they have null normalized_channel.
The GUD datasets currently include only the release version of Fenix and they ignore any data in the `org_mozilla_fenix_nightly_stable` dataset. We should probably build in that support. But it brings...
It would be useful to have generated api documentation when working across projects (in particular knowing the interafaces for the ingestion-core stuff). There's a [maven javadoc](https://maven.apache.org/plugins/maven-javadoc-plugin/usage.html) plugin that can be...
In discussion with @6a68, we've identified a subset of the FxA log data (amplitudeEvent messages) that we want to process in real time via Pub/Sub, but which don't need to...
In `ParsePayload`, we look for an OS name inside the ping using known locations for common ping, glean, and core ping. Any other payload type gets a null `normalized_os` value....
As we continue to have discussions about data retention policies, sampling may become an even more important concern where we permanently delete data after a certain period of time based...
Currently, merging code to `master` will trigger stage deploys of all Beam jobs, and then make the code eligible for manual deployment to prod. `ingestion-sink` relies on a user with...
https://github.com/mozilla/gcp-ingestion/blob/9ed815afe17d17715aae9b4c1cd91517dcdd6d76/ingestion-beam/src/main/java/com/mozilla/telemetry/decoder/GeoCityLookup.java#L103 The program can potentially fail to release a system resource.
When writing to BigQuery in `payload` format (tables in `*_live` datasets), field names are normalized to snake_case, and it's currently possible to have multiple fields that map to the same...