etl
etl copied to clipboard
NDT parser should generate Mean{Download,Upload}ThroughputMbps columns
Almost everyone who uses the NDT data wants to find the mean download throughput, mean upload throughput, and min RTT. The min RTT is easy to find - it's just web100_log_entry.snap.MinRTT
, but the download and upload throughput is a pain. For the download case, we ask people to use the query
SELECT
8 * (web100_log_entry.snap.HCThruOctetsAcked /
(web100_log_entry.snap.SndLimTimeRwin +
web100_log_entry.snap.SndLimTimeCwnd +
web100_log_entry.snap.SndLimTimeSnd)) AS MeanDownloadThroughputMbps
FROM ...
which is confusing.
Instead, the parser should create two synthetic columns: MeanDownloadThroughputMbps and MeanUploadThroughputMbps. Then the NDT data will be easy to query in the common case.
Once this issue is resolved, we should revert https://github.com/m-lab/etl-schema/pull/37 , because real columns will exist and we won't need synthetic columns.