etl icon indicating copy to clipboard operation
etl copied to clipboard

NDT parser should generate Mean{Download,Upload}ThroughputMbps columns

Open pboothe opened this issue 5 years ago • 1 comments

Almost everyone who uses the NDT data wants to find the mean download throughput, mean upload throughput, and min RTT. The min RTT is easy to find - it's just web100_log_entry.snap.MinRTT, but the download and upload throughput is a pain. For the download case, we ask people to use the query

SELECT
       8 * (web100_log_entry.snap.HCThruOctetsAcked /
       (web100_log_entry.snap.SndLimTimeRwin +
        web100_log_entry.snap.SndLimTimeCwnd +
        web100_log_entry.snap.SndLimTimeSnd)) AS MeanDownloadThroughputMbps
FROM ...

which is confusing.

Instead, the parser should create two synthetic columns: MeanDownloadThroughputMbps and MeanUploadThroughputMbps. Then the NDT data will be easy to query in the common case.

pboothe avatar May 22 '19 16:05 pboothe

Once this issue is resolved, we should revert https://github.com/m-lab/etl-schema/pull/37 , because real columns will exist and we won't need synthetic columns.

pboothe avatar May 22 '19 16:05 pboothe