bigquery
bigquery copied to clipboard
Making the HTTP2 query cheaper
We have a HTTP/2 requests graph which does a look up on the $_protocol
field in the requests.payload
column. This currently costs 211TB and costs an estimated $1,058 (yes - one thousand bucks!!!) and counting, to run and is re-run every month. Which is quite frankly ridiculous. It also takes forever to run and sometimes times out.
I wanted to add an HTTP/3 graph since it's getting out there but can't justify doubling that cost! While our generous benefactor may be able to absorb that, others can't, and I think we should be setting a better examples here.
If we use the summary_requests
table and use the reqHttpVersion
, or respHttpVersion
(or both!) then the cost plummets to 363GB and or an estimated $1.77!!! And the data looks pretty similar (not exactly the same as requests
and summary_requests
look to have slight differences in number of rows, but close enough).
However, there is an issue as these fields had bad data for a long time (relevant WPT issue and was only fixed from October 2020. I would prefer to track the growth longer than that and ideally back to 2015 when HTTP/2 was launched.
So we've a few choices:
- Fix up the bad data. Ideally we'd join
requests
tosummary_requests
and update the badreqHttpVersion
, orrespHttpVersion
values to the$._protocol
field but can't figure out how to do that. - Patch the bad data by saying
ori:
,us:
,od:
,me:
or: /
values are effectively HTTP/2. This isn't always the case and there are a small number of HTTP/1.1 connections which give those values, but it's close enough and a lot easier to run this clean up than option 1 (unless there is a way to join these two tables I'm not seeing?). - Have a hacky SQL (see below) to patch it in the query instead. Seems a bit of a hack.
- Add the
protocol
column tosummary_requests table
and backfill all the old values. Seems like quite an effort. - Wait until we reorganised the tables like we've talked about.
- Leave as is and just implement HTTP/3 query in cheaper manner.
Thoughts?
#standardSQL
SELECT
SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
UNIX_DATE(CAST(REPLACE(SUBSTR(_TABLE_SUFFIX, 0, 10), '_', '-') AS DATE)) * 1000 * 60 * 60 * 24 AS timestamp,
IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
ROUND(SUM(IF(respHttpVersion = 'HTTP/2'
OR respHttpVersion = 'ori' -- bad value that mostly means HTTP/2 (parsed incorrectly from :authority:)
OR respHttpVersion = 'us:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :status:)
OR respHttpVersion = 'od:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :method:)
OR respHttpVersion = 'me:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :scheme:)
OR respHttpVersion = ': /' -- bad value that mostly means HTTP/2 (parsed incorrectly from :path:)
OR reqHttpVersion = 'HTTP/2'
OR reqHttpVersion = 'ori' -- bad value that mostly means HTTP/2 (parsed incorrectly from :authority:)
OR reqHttpVersion = 'us:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :status:)
OR reqHttpVersion = 'od:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :method:)
OR reqHttpVersion = 'me:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :scheme:)
OR reqHttpVersion = ': /' -- bad value that mostly means HTTP/2 (parsed incorrectly from :path:)
, 1, 0)) * 100 / COUNT(0), 2) AS percent
FROM
`httparchive.summary_requests.*`
GROUP BY
date,
timestamp,
client
ORDER BY
date DESC,
client
Here's the comparison of what that comes back with compared to the current production site:
date | timestamp | client | percent | curr_pct | diff |
---|---|---|---|---|---|
2021_05_01 | 1.6198E+12 | desktop | 64.55 | 64.8 | 0.25 |
2021_05_01 | 1.6198E+12 | mobile | 64.96 | 65.3 | 0.34 |
2021_04_01 | 1.6172E+12 | desktop | 68.46 | 68.6 | 0.14 |
2021_04_01 | 1.6172E+12 | mobile | 67.47 | 67.6 | 0.13 |
2021_03_01 | 1.6146E+12 | desktop | 68.5 | 68.6 | 0.1 |
2021_03_01 | 1.6146E+12 | mobile | 68.15 | 68.3 | 0.15 |
2021_02_01 | 1.6121E+12 | desktop | 68.15 | 68.3 | 0.15 |
2021_02_01 | 1.6121E+12 | mobile | 68.05 | 68.2 | 0.15 |
2021_01_01 | 1.6095E+12 | desktop | 67.19 | 67.3 | 0.11 |
67.5 | 67.5 | ||||
2020_12_01 | 1.6068E+12 | desktop | 66.75 | 66.9 | 0.15 |
2020_12_01 | 1.6068E+12 | mobile | 67.11 | 67.3 | 0.19 |
2020_11_01 | 1.6042E+12 | desktop | 65.95 | 66.1 | 0.15 |
2020_11_01 | 1.6042E+12 | mobile | 66.24 | 66.4 | 0.16 |
2020_10_01 | 1.6015E+12 | desktop | 65.57 | 65.7 | 0.13 |
2020_10_01 | 1.6015E+12 | mobile | 65.46 | 65.6 | 0.14 |
2020_09_01 | 1.5989E+12 | desktop | 63.52 | 64.8 | 1.28 |
2020_09_01 | 1.5989E+12 | mobile | 65.61 | 64.9 | -0.71 |
2020_08_01 | 1.5962E+12 | desktop | 62.53 | 63.7 | 1.17 |
2020_08_01 | 1.5962E+12 | mobile | 65.09 | 63.8 | -1.29 |
2020_07_01 | 1.5936E+12 | desktop | 62.23 | 64.2 | 1.97 |
2020_07_01 | 1.5936E+12 | mobile | 64.43 | 64.2 | -0.23 |
2020_06_01 | 1.591E+12 | desktop | 61.46 | 64.4 | 2.94 |
2020_06_01 | 1.591E+12 | mobile | 62.34 | 64.5 | 2.16 |
2020_05_01 | 1.5883E+12 | desktop | 60.63 | 63.4 | 2.77 |
2020_05_01 | 1.5883E+12 | mobile | 61.79 | 63.8 | 2.01 |
2020_04_01 | 1.5857E+12 | desktop | 59.6 | 62.2 | 2.6 |
2020_04_01 | 1.5857E+12 | mobile | 60.6 | 62.4 | 1.8 |
2020_03_01 | 1.583E+12 | desktop | 59.79 | 62.3 | 2.51 |
2020_03_01 | 1.583E+12 | mobile | 60.68 | 62.5 | 1.82 |
2020_02_01 | 1.5805E+12 | desktop | 60.32 | 63.5 | 3.18 |
2020_02_01 | 1.5805E+12 | mobile | 60.91 | 63.1 | 2.19 |
2020_01_01 | 1.5778E+12 | desktop | 55.1 | 59.2 | 4.1 |
2020_01_01 | 1.5778E+12 | mobile | 55.11 | 59.3 | 4.19 |
2019_12_01 | 1.5752E+12 | desktop | 54.37 | 58.9 | 4.53 |
2019_12_01 | 1.5752E+12 | mobile | 54.27 | 58.9 | 4.63 |
2019_11_01 | 1.5726E+12 | desktop | 47.22 | 58 | 10.78 |
2019_11_01 | 1.5726E+12 | mobile | 53.51 | 58.2 | 4.69 |
2019_10_01 | 1.5699E+12 | desktop | 52.55 | 57.1 | 4.55 |
2019_10_01 | 1.5699E+12 | mobile | 52.43 | 56.9 | 4.47 |
2019_09_01 | 1.5673E+12 | desktop | 51.8 | 56.2 | 4.4 |
2019_09_01 | 1.5673E+12 | mobile | 53.47 | 56 | 2.53 |
2019_08_01 | 1.5646E+12 | desktop | 51.4 | 55.7 | 4.3 |
2019_08_01 | 1.5646E+12 | mobile | 55.16 | 55.5 | 0.34 |
2019_07_01 | 1.5619E+12 | desktop | 51.81 | 54.9 | 3.09 |
2019_07_01 | 1.5619E+12 | mobile | 54.53 | 54.8 | 0.27 |
2019_06_01 | 1.5593E+12 | desktop | 50.83 | 53.8 | 2.97 |
2019_06_01 | 1.5593E+12 | mobile | 50.21 | 53.3 | 3.09 |
2019_05_01 | 1.5567E+12 | desktop | 48.16 | 53.1 | 4.94 |
2019_05_01 | 1.5567E+12 | mobile | 47.38 | 52.6 | 5.22 |
2019_04_01 | 1.5541E+12 | desktop | 45.57 | 52.3 | 6.73 |
2019_04_01 | 1.5541E+12 | mobile | 44.19 | 52 | 7.81 |
2019_03_01 | 1.5514E+12 | desktop | 48.49 | 50.6 | 2.11 |
2019_03_01 | 1.5514E+12 | mobile | 47.34 | 50.7 | 3.36 |
2019_02_01 | 1.549E+12 | desktop | 49.63 | 49.7 | 0.07 |
2019_02_01 | 1.549E+12 | mobile | 49.79 | 49.8 | 0.01 |
48.3 | 48.3 | ||||
48.3 | 48.3 | ||||
2018_12_15 | 1.5448E+12 | desktop | 32.8 | 47.8 | 15 |
2018_12_15 | 1.5448E+12 | mobile | 36.73 | 48.9 | 12.17 |
49.1 | 49.1 | ||||
48.8 | 48.8 | ||||
2018_11_15 | 1.5422E+12 | desktop | 46.92 | 48.4 | 1.48 |
2018_11_15 | 1.5422E+12 | mobile | 46.87 | 48.4 | 1.53 |
2018_11_01 | 1.541E+12 | desktop | 46.27 | 47.8 | 1.53 |
47.5 | 47.5 | ||||
2018_10_15 | 1.5396E+12 | desktop | 45.53 | 46.5 | 0.97 |
2018_10_15 | 1.5396E+12 | mobile | 45.13 | 46.2 | 1.07 |
2018_10_01 | 1.5384E+12 | desktop | 45.89 | 46 | 0.11 |
2018_10_01 | 1.5384E+12 | mobile | 45.53 | 45.5 | -0.03 |
2018_09_15 | 1.537E+12 | desktop | 45.66 | 45.8 | 0.14 |
2018_09_15 | 1.537E+12 | mobile | 45.19 | 45.2 | 0.01 |
2018_09_01 | 1.5358E+12 | desktop | 44.83 | 45 | 0.17 |
2018_09_01 | 1.5358E+12 | mobile | 44.6 | 44.6 | 0 |
2018_08_15 | 1.5343E+12 | desktop | 44.65 | 44.8 | 0.15 |
44.9 | 44.9 | ||||
2018_08_01 | 1.5331E+12 | desktop | 44.26 | 44.4 | 0.14 |
2018_08_01 | 1.5331E+12 | mobile | 44.61 | 44.6 | -0.01 |
2018_07_15 | 1.5316E+12 | desktop | 43.77 | 44 | 0.23 |
2018_07_15 | 1.5316E+12 | mobile | 44.3 | 44.3 | 0 |
2018_07_01 | 1.5304E+12 | desktop | 43.42 | 43.6 | 0.18 |
2018_07_01 | 1.5304E+12 | mobile | 41.37 | 41.6 | 0.23 |
2018_06_15 | 1.529E+12 | desktop | 38.59 | 38.8 | 0.21 |
2018_06_15 | 1.529E+12 | mobile | 40.36 | 40.6 | 0.24 |
2018_06_01 | 1.5278E+12 | desktop | 38.17 | 38.2 | 0.03 |
2018_06_01 | 1.5278E+12 | mobile | 39.9 | 40.1 | 0.2 |
2018_05_15 | 1.5263E+12 | desktop | 38.16 | 38.3 | 0.14 |
2018_05_15 | 1.5263E+12 | mobile | 39.56 | 39.7 | 0.14 |
2018_05_01 | 1.5251E+12 | desktop | 37.94 | 38 | 0.06 |
2018_05_01 | 1.5251E+12 | mobile | 39.21 | 39.4 | 0.19 |
2018_04_15 | 1.5238E+12 | desktop | 37.59 | 37.6 | 0.01 |
2018_04_15 | 1.5238E+12 | mobile | 39.16 | 39.4 | 0.24 |
37.1 | 37.1 | ||||
38.7 | 38.7 | ||||
2018_03_15 | 1.5211E+12 | desktop | 36.67 | 36.8 | 0.13 |
2018_03_15 | 1.5211E+12 | mobile | 37.82 | 38 | 0.18 |
2018_03_01 | 1.5199E+12 | desktop | 35.9 | 35.9 | 0 |
2018_03_01 | 1.5199E+12 | mobile | 37.1 | 37.3 | 0.2 |
2018_02_15 | 1.5187E+12 | desktop | 35.46 | 35.5 | 0.04 |
2018_02_15 | 1.5187E+12 | mobile | 36.39 | 36.5 | 0.11 |
2018_02_01 | 1.5174E+12 | desktop | 35.23 | 35.3 | 0.07 |
2018_02_01 | 1.5174E+12 | mobile | 35.98 | 36.1 | 0.12 |
2018_01_15 | 1.516E+12 | desktop | 33.9 | 34 | 0.1 |
2018_01_15 | 1.516E+12 | mobile | 34.69 | 34.8 | 0.11 |
2018_01_01 | 1.5148E+12 | desktop | 33.3 | 33.7 | 0.4 |
2018_01_01 | 1.5148E+12 | mobile | 34.3 | 34.7 | 0.4 |
2017_12_15 | 1.5133E+12 | desktop | 33 | 33.4 | 0.4 |
2017_12_15 | 1.5133E+12 | mobile | 34.03 | 34.4 | 0.37 |
2017_12_01 | 1.5121E+12 | desktop | 31.92 | 32.4 | 0.48 |
2017_12_01 | 1.5121E+12 | mobile | 32.58 | 33.1 | 0.52 |
2017_11_15 | 1.5107E+12 | desktop | 31.39 | 31.8 | 0.41 |
32.6 | 32.6 | ||||
2017_11_01 | 1.5095E+12 | desktop | 31.11 | 31.5 | 0.39 |
2017_11_01 | 1.5095E+12 | mobile | 31.76 | 32.4 | 0.64 |
2017_10_15 | 1.508E+12 | desktop | 30.19 | 30.6 | 0.41 |
2017_10_15 | 1.508E+12 | mobile | 31.06 | 31.5 | 0.44 |
2017_10_01 | 1.5068E+12 | desktop | 29.89 | 30.2 | 0.31 |
2017_10_01 | 1.5068E+12 | mobile | 30.54 | 31.1 | 0.56 |
2017_09_15 | 1.5054E+12 | desktop | 28.88 | 29.2 | 0.32 |
2017_09_15 | 1.5054E+12 | mobile | 29.43 | 30 | 0.57 |
2017_09_01 | 1.5042E+12 | desktop | 28.21 | 0 | -28.21 |
2017_09_01 | 1.5042E+12 | mobile | 29 | 0.1 | -28.9 |
2017_08_15 | 1.5028E+12 | desktop | 27.25 | 0 | -27.25 |
2017_08_15 | 1.5028E+12 | mobile | 28.07 | 0 | -28.07 |
2017_08_01 | 1.5015E+12 | desktop | 26.76 | 0 | -26.76 |
2017_08_01 | 1.5015E+12 | mobile | 27.41 | 0 | -27.41 |
2017_07_15 | 1.5001E+12 | desktop | 26.63 | 26.5 | -0.13 |
2017_07_15 | 1.5001E+12 | mobile | 27.02 | 27.1 | 0.08 |
2017_07_01 | 1.4989E+12 | desktop | 26.14 | 26 | -0.14 |
2017_07_01 | 1.4989E+12 | mobile | 26.44 | 26.5 | 0.06 |
2017_06_15 | 1.4975E+12 | desktop | 25.29 | 25.2 | -0.09 |
2017_06_15 | 1.4975E+12 | mobile | 25.88 | 26 | 0.12 |
2017_06_01 | 1.4963E+12 | desktop | 25.05 | 25 | -0.05 |
2017_06_01 | 1.4963E+12 | mobile | 25.47 | 25.7 | 0.23 |
2017_05_15 | 1.4948E+12 | desktop | 25.02 | 24.9 | -0.12 |
2017_05_15 | 1.4948E+12 | mobile | 25.29 | 25.5 | 0.21 |
2017_05_01 | 1.4936E+12 | desktop | 24.87 | 23.9 | -0.97 |
2017_05_01 | 1.4936E+12 | mobile | 24.49 | 23.8 | -0.69 |
2017_04_15 | 1.4922E+12 | desktop | 25.12 | 24.9 | -0.22 |
2017_04_15 | 1.4922E+12 | mobile | 25.41 | 25.2 | -0.21 |
2017_04_01 | 1.491E+12 | desktop | 24.55 | 24.7 | 0.15 |
2017_04_01 | 1.491E+12 | mobile | 24.69 | 24.9 | 0.21 |
2017_03_15 | 1.4895E+12 | desktop | 23.78 | 24 | 0.22 |
2017_03_15 | 1.4895E+12 | mobile | 23.69 | 23.9 | 0.21 |
2017_03_01 | 1.4883E+12 | desktop | 23.4 | 23.4 | 0 |
2017_03_01 | 1.4883E+12 | mobile | 23.3 | 23.4 | 0.1 |
2017_02_15 | 1.4871E+12 | desktop | 23.07 | 23.1 | 0.03 |
2017_02_15 | 1.4871E+12 | mobile | 22.91 | 23.1 | 0.19 |
2017_02_01 | 1.4859E+12 | desktop | 22.74 | 22.8 | 0.06 |
2017_02_01 | 1.4859E+12 | mobile | 22.85 | 22.9 | 0.05 |
22 | 22 | ||||
2017_01_15 | 1.4844E+12 | mobile | 22 | 22 | 0 |
21.3 | 21.3 | ||||
2017_01_01 | 1.4832E+12 | mobile | 21.58 | 21.6 | 0.02 |
2016_12_15 | 1.4818E+12 | desktop | 19.68 | 20.9 | 1.22 |
21.3 | 21.3 | ||||
20.7 | 20.7 | ||||
21.2 | 21.2 | ||||
2016_11_15 | 1.4792E+12 | desktop | 20.54 | 20.3 | -0.24 |
2016_11_15 | 1.4792E+12 | mobile | 20.55 | 20.6 | 0.05 |
2016_11_01 | 1.478E+12 | desktop | 20.25 | 20.3 | 0.05 |
2016_11_01 | 1.478E+12 | mobile | 19.91 | 20 | 0.09 |
2016_10_15 | 1.4765E+12 | desktop | 18.66 | 18.6 | -0.06 |
2016_10_15 | 1.4765E+12 | mobile | 19.37 | 19.7 | 0.33 |
2016_10_01 | 1.4753E+12 | desktop | 18.5 | 18.7 | 0.2 |
2016_10_01 | 1.4753E+12 | mobile | 19.32 | 19.5 | 0.18 |
2016_09_15 | 1.4739E+12 | desktop | 17.11 | 17.4 | 0.29 |
2016_09_15 | 1.4739E+12 | mobile | 17.29 | 17.5 | 0.21 |
2016_09_01 | 1.4727E+12 | desktop | 16.45 | 16.5 | 0.05 |
2016_09_01 | 1.4727E+12 | mobile | 16.66 | 16.5 | -0.16 |
2016_08_15 | 1.4712E+12 | desktop | 16.49 | 16.5 | 0.01 |
2016_08_15 | 1.4712E+12 | mobile | 16.4 | 16.4 | 0 |
2016_08_01 | 1.47E+12 | desktop | 16.36 | 16.4 | 0.04 |
16.2 | 16.2 | ||||
2016_07_15 | 1.4685E+12 | desktop | 15.9 | 0 | -15.9 |
0 | 0 | ||||
2016_07_01 | 1.4673E+12 | desktop | 15.47 | 0 | -15.47 |
0 | 0 | ||||
2016_06_15 | 1.4659E+12 | desktop | 15.16 | 0 | -15.16 |
0 | 0 | ||||
2016_06_01 | 1.4647E+12 | desktop | 13.72 | 0 | -13.72 |
0 | 0 | ||||
2016_05_15 | 1.4633E+12 | desktop | 13.15 | 0 | -13.15 |
0 | 0 | ||||
2016_05_01 | 1.4621E+12 | desktop | 0 | 0 | 0 |
0 | 0 | ||||
2016_04_15 | 1.4607E+12 | desktop | 0 | 0 | 0 |
0 | 0 | ||||
2016_04_01 | 1.4595E+12 | desktop | 0 | 0 | 0 |
0 | 0 | ||||
2016_03_15 | 1.458E+12 | desktop | 0 | 0 | 0 |
0 | 0 | ||||
2016_03_01 | 1.4568E+12 | desktop | 0 | 0 | 0 |
2016_03_01 | 1.4568E+12 | mobile | 0 | 0 | 0 |
2016_02_15 | 1.4555E+12 | desktop | 0 | 0 | 0 |
2016_02_15 | 1.4555E+12 | mobile | 0 | 0 | 0 |
2016_02_01 | 1.4543E+12 | desktop | 0 | 0 | 0 |
2016_02_01 | 1.4543E+12 | mobile | 0 | 0 | 0 |
2016_01_15 | 1.4528E+12 | desktop | 0 | 0 | 0 |
2016_01_15 | 1.4528E+12 | mobile | 0 | 0 | 0 |
2016_01_01 | 1.4516E+12 | desktop | 0 | 0 | 0 |
2016_01_01 | 1.4516E+12 | mobile | 0 | 0 | 0 |
It should be ok to rely on the pages
dataset for reports, but I think the problem is that our timeseries queries require processing historical data every single time and this dataset is very expensive (like lighthouse
). I think the first step to resolving this in the general case is to change our timeseries queries to be more iterative and only process the most recent month and append that to the JSON. We'd also need to handle when the data already exists in the JSON, eg if the script ran twice by accident or we had to backfill bad data in an older crawl.
Proposed a fix for incremental runs in #115
I still think we should consider patching the old data, and then separately also correcting the historic data.
The incorrect data is absolutely meaningless so pointless to keep around. And I dislike having such an expensive query exposed on the website incase anyone attempts to run it for their own reasons.