gcp-ingestion
gcp-ingestion copied to clipboard
Documentation and implementation of telemetry ingestion on Google Cloud Platform
Avro schemas can be generated from [mozilla-pipeline-schema](https://github.com/mozilla-services/mozilla-pipeline-schemas) through the [jsonschema-transpiler](https://github.com/acmiyaguchi/jsonschema-transpiler). These schemas should be made available to ingestion-beam via the `AvroSchemaStore` added in #448. This would make the pipeline capable...
We should write up some details around deduplication, and what are the upper bounds on how many duplicates we should expect, and ideally link to same from docs.tmo.
Should explain and record the limitations, expectations, and implications someone might need to know as a consumer of GCP Ingestion.
Gzip (aka zlib) is a widely deployed and well-supported compression format. We already allow Firefox clients to send gzip-compressed telemetry payloads and most programming languages and frameworks have good built-in...
The filenames we assign for file-based output include the start and end time of the window being written, and by default we windows of size 10 minutes. The windows, however,...
It appears that [the PR](https://github.com/apache/beam/pull/1952) resolving https://issues.apache.org/jira/browse/BEAM-1438 was intended to allow unbounded jobs to write files without specifying a number of shards, but [there is still an argument check that...
the test currently has a 15 second run timer, because the test is for an unbounded input which does not have a natural end. When we have time, we should...
Perhaps this should wait for #65, but it would be good to remove some of the shim code we have for ensuring stable ObjectMapper behavior that allows us to compare...