etl
etl copied to clipboard
M-Lab ingestion pipeline
In particular: ```sql SELECT count(*), connection_spec.client.network.asn, avg(8 * (web100_log_entry.snap.HCThruOctetsAcked / (web100_log_entry.snap.SndLimTimeRwin + web100_log_entry.snap.SndLimTimeCwnd + web100_log_entry.snap.SndLimTimeSnd))) AS download_Mbps, connection_spec.client_geolocation.city, connection_spec.client_geolocation.region, avg(connection_spec.client_geolocation.latitude) AS latitude, avg(connection_spec.client_geolocation.longitude) AS longitude FROM `measurement-lab.ndt.web100` WHERE connection_spec.client_geolocation.country_name='United States'...
This would incredibly powerful for understanding the local traffic on each machine and site, and through local network segments transiting to different ASNs. We can use it for finding cross...
Almost everyone who uses the NDT data wants to find the mean download throughput, mean upload throughput, and min RTT. The min RTT is easy to find - it's just...
2019/05/15 18:42:25 geo.go:88: Post http://127.0.0.1:33763/10583?: dial tcp 127.0.0.1:33763: connect: connection refused 2019/05/15 18:42:25 geo.go:258: BatchQueryAnnotationService Error: Post http://127.0.0.1:33763/10583?: dial tcp 127.0.0.1:33763: connect: connection refused
1. Convert to column based partitioning on log_time 2. change ASN field to integer 3. move client ASN and server ASN to top level fields, so they can be clustered....
ETL pipeline has been down in staging since 4/12, but no staging alert has fired.
During the Gardener deployments, we have discovered that traceroute write performance prevents many tasks from completing, which prevents Gardener from making progress. [The reason](https://github.com/m-lab/dev-tracker/issues/126) is the combination of 1hr response...