etl
etl copied to clipboard
M-Lab ingestion pipeline
After deploying the new alternative ETL pipeline SLIs, we found that the scamper1 datatype would report parse errors after restarting: We suspected this may be due to a temporary format...
According to Prometheus metric naming best practices, accumulating counts should end with the suffix Total https://prometheus.io/docs/practices/naming/. A lot of the accumulating count metrics in etl (e.g., PanicCount, WorkerCount) have the...
Currently, `etl_worker` crashes in local development mode when a paris-traceroute archive is supplied as a URL. Steps to reproduce: 1. Navigate to cmd/etl_worker within the ETL project. 2. Run `go...
Since https://github.com/m-lab/etl/pull/972, the ETL `Version` and `GitCommit` are compiled in at build time. And, the `Version` is always a human readable symbolic name; either the branch (e.g. sandbox-soltesz, master) or...
New parsers should NOT annotate records, as they are annotated by joins in BQ. The K8S annotation-service should be shut down, and null-annotator should be used for 2.0 parsing tasks....
We are currently seeing a low rate of GCS storage errors: ``` 2021/04/13 04:54:19 rowwriter.go:119: googleapi: got HTTP response code 503 with body: Service Unavailable etl-mlab-staging ndt/ndt7/2020/08/27/20200827T170704.505210Z-ndt7-mlab3-lhr05-ndt.tgz.json textPayload: "2021/04/13 04:54:19...
https://github.com/m-lab/etl/blob/5caa9cbbd394ec4f0f7cd1e82eeec6b26a21525b/task/task.go#L66-L66 Currently, these errors are dropped - not reported to Gardener. Among these errors are GCS write errors, reported on https://github.com/m-lab/etl/blob/5caa9cbbd394ec4f0f7cd1e82eeec6b26a21525b/storage/rowwriter.go#L168-L169
I've recently updated the descriptions for fields in https://github.com/m-lab/etl/tree/master/schema/descriptions There are a few marked `TBD` that need definitions written in the list of files below, and existing definitions should be...
The first web100 download row is 2009-07-02 and the first upload is 2009-02-18, as reported by SELECT * FROM `mlab-sandbox.inspector.union_ndt_prod_all` Also reproduced: SELECT MIN(date) FROM `measurement-lab.ndt.unified_downloads` WHERE date < '2009-09-01'...
TCP RTT calculation changed from mS to uS. Is bloated test is supposed to be RTT > 1 Second.