etl
etl copied to clipboard
M-Lab ingestion pipeline
This change adds support for filtering select client_name values from the ndt7 ClientMetadata. This change prevents the value from being published to BigQuery, making the value effectively unsearchable, though it...
Originally, tcpinfo snapshots were thinned 10:1, leaving what could be up to 10ms resolution to 100ms for NDT measurements. For more detailed analysis this higher resolution may be preferable. However,...
For unit testing and data analysis, it would be helpful to know what the original length of the raw snapshots was. Today there is no indication that snapshots were thinned...
Today, we manually enumerate the OAM IPs in views. Ultimately, the parser should receive a list of OAM IPs from configuration at run time and label a standard column "filter.IsOAM"...
The data-processing cluster in mlab-sandbox & mlab-staging is in us-east, while the archive-measurement-lab bucket is in us-central1. These clusters should be redeployed to us-central, and their output buckets recreated in...
1. start sketching out a Neubot parser 2. try to figure out how to pass a file to a "thing" that should process such file 3. decide that _proably_ etl_worker.go...
Because the data pipeline co-existed with two versions v1 & v2, and now that the v1 data pipeline is decommissioned, we can begin deleting code relevant only to the v1...
Recently, the `ParserFailureRateTooHighOrMissing ` alert fired https://github.com/m-lab/dev-tracker/issues/727 due to an actual spike in task errors (individual archives). Upon investigation, it was due to `ETLSourceError`, which can be due to transient...
Steps taken from: https://docs.google.com/document/d/1seI56IGAZzfIhmkZH_Pp67fU11kynyO6mwf7gU3HeiM/edit#heading=h.386cbm2zij4h Prepare pipeline - [x] Revert switch v2 parser from HEAD - [x] Merge paris1 hopannotation1 changes in v1 pipeline - [x] Deploy etl w/ hopannotation1 changes...
To conserve engineering effort we elected to discontinue work on the v1 datatypes that were not generating new data. * https://github.com/m-lab/etl/issues/1050 After removing most v1 logic, the v1 parser logic...