etl icon indicating copy to clipboard operation
etl copied to clipboard

M-Lab ingestion pipeline

Results 105 etl issues
Sort by recently updated
recently updated
newest added

In particular: ```sql SELECT count(*), connection_spec.client.network.asn, avg(8 * (web100_log_entry.snap.HCThruOctetsAcked / (web100_log_entry.snap.SndLimTimeRwin + web100_log_entry.snap.SndLimTimeCwnd + web100_log_entry.snap.SndLimTimeSnd))) AS download_Mbps, connection_spec.client_geolocation.city, connection_spec.client_geolocation.region, avg(connection_spec.client_geolocation.latitude) AS latitude, avg(connection_spec.client_geolocation.longitude) AS longitude FROM `measurement-lab.ndt.web100` WHERE connection_spec.client_geolocation.country_name='United States'...

P3
backlog

This would incredibly powerful for understanding the local traffic on each machine and site, and through local network segments transiting to different ASNs. We can use it for finding cross...

P2
backlog

Almost everyone who uses the NDT data wants to find the mean download throughput, mean upload throughput, and min RTT. The min RTT is easy to find - it's just...

P2
backlog

2019/05/15 18:42:25 geo.go:88: Post http://127.0.0.1:33763/10583?: dial tcp 127.0.0.1:33763: connect: connection refused 2019/05/15 18:42:25 geo.go:258: BatchQueryAnnotationService Error: Post http://127.0.0.1:33763/10583?: dial tcp 127.0.0.1:33763: connect: connection refused

P1
bug

1. Convert to column based partitioning on log_time 2. change ASN field to integer 3. move client ASN and server ASN to top level fields, so they can be clustered....

P2
backlog

ETL pipeline has been down in staging since 4/12, but no staging alert has fired.

P2
backlog
sre/review

During the Gardener deployments, we have discovered that traceroute write performance prevents many tasks from completing, which prevents Gardener from making progress. [The reason](https://github.com/m-lab/dev-tracker/issues/126) is the combination of 1hr response...

P1
Epic
backlog
Sprint 8
2019
Sprint 7