etl icon indicating copy to clipboard operation
etl copied to clipboard

M-Lab ingestion pipeline

Results 105 etl issues
Sort by recently updated
recently updated
newest added

SELECT * FROM `mlab-oti.ndt.traceroute` WHERE Parseinfo.TaskFileName = "gs://archive-measurement-lab/ndt/traceroute/2019/11/02/20191102T020000.909108Z-traceroute-mlab4-lhr05-ndt.tgz" LIMIT 1000 Row | partition_date | uuid | TestTime | Parseinfo.TaskFileName | Parseinfo.ParseTime | Parseinfo.ParserVersion | Parseinfo.Filename | start_time | stop_time |...

P0
current
2020
Week 8

Gardner jobs/ already supports a bq table, but it should also allow other kinds of destination, e.g. GCS JSONL files. Parsers need to respect the destination requested by Gardener.

P1
current
2020
Week 10

Processed 723 files, 0 nil data, 0 rows committed, 722 failed, from gs://archive-measurement-lab/ndt/ndt5/2019/11/12/20191112T201953.499657Z-ndt5-mlab3-syd02-ndt.tgz into ndt5_20191112 Failure rate sounds very high for a single tarball

P3
backlog

We expect those numbers to be very close ~95+% -- but when it drops too low there may be another problem with data collection or parsing or inserting to BQ...

P1
backlog

When gardener updates fail, the parser should start a goroutine to retry the update. Otherwise update may be entirely lost. If the update is the state change, then the job...

P2
backlog

Trying to find a combination of build directives and docker base images that works reliably turns out to be non-trivial. I had hoped to use alpine with appropriate static linking,...

P2
backlog

Inserts are sometimes failing on tcpinfo buffers. Likely due to large number of snapshots for some rows. Should make two changes: 1. Limit number of snapshots in a row. Perhaps...

P1
backlog

Part of m-lab/dev-tracker#501 Use gardener update/ to send per task updates. Use gardener heartbeat/ to send per job heartbeat, once per minute. This allows gardener to detect ETL instance crashes.

P1
backlog
Q4
Q1

Deployments are failing, apparently in the schema sync stage. It appears that bq.py flags may have changed, and we need to update the scripts.

P2
backlog

There are separate Window Scale parameters for each half of the connection. They appear in tcp_info as rcv_wscale and snd_wscale. We are probably parsing both into a single field.

P1
bug
backlog