cwv-tech-report
cwv-tech-report copied to clipboard
Automate the update of the CWV Tech Report
Each month I manually run a query to append the latest data to the httparchive:core_web_vitals.technologies
table.
The query have a few places that need to be updated to reflect the latest dataset ("2021-07-01" and "2021_07_01" below). The query also depends on two tables that are made available at different times: the HA technologies.YYYY_MM_DD_*
tables are made available at the end of the month corresponding to the release date, while the CrUX materialzied.device_summary
table isn't updated until the second Tuesday of the month following the release date. If that's confusing, for example the 2021-07-01 (July 2021) dataset was released today on August 10, the second Tuesday of August.
I can help to publish a Pub/Sub topic when the CrUX release is available on BigQuery. This project would need to subscribe to that topic and kick off a process to run a query like the one below for the given release and append the results to the CWV table.
CREATE TEMP FUNCTION IS_GOOD(good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
good / (good + needs_improvement + poor) >= 0.75
);
CREATE TEMP FUNCTION IS_NON_ZERO(good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
good + needs_improvement + poor > 0
);
SELECT
date,
ARRAY_TO_STRING(ARRAY_AGG(DISTINCT category IGNORE NULLS ORDER BY category), ', ') AS categories,
app,
client,
COUNT(DISTINCT url) AS origins,
COUNT(DISTINCT IF(good_fid, url, NULL)) AS origins_with_good_fid,
COUNT(DISTINCT IF(good_cls, url, NULL)) AS origins_with_good_cls,
COUNT(DISTINCT IF(good_lcp, url, NULL)) AS origins_with_good_lcp,
COUNT(DISTINCT IF(any_fid, url, NULL)) AS origins_with_any_fid,
COUNT(DISTINCT IF(any_cls, url, NULL)) AS origins_with_any_cls,
COUNT(DISTINCT IF(any_lcp, url, NULL)) AS origins_with_any_lcp,
COUNT(DISTINCT IF(good_cwv, url, NULL)) AS origins_with_good_cwv,
COUNT(DISTINCT IF(any_lcp AND any_cls, url, NULL)) AS origins_eligible_for_cwv,
SAFE_DIVIDE(COUNTIF(good_cwv), COUNTIF(any_lcp AND any_cls)) AS pct_eligible_origins_with_good_cwv
FROM (
SELECT
date,
CONCAT(origin, '/') AS url,
IF(device = 'desktop', 'desktop', 'mobile') AS client,
IS_NON_ZERO(fast_fid, avg_fid, slow_fid) AS any_fid,
IS_GOOD(fast_fid, avg_fid, slow_fid) AS good_fid,
IS_NON_ZERO(small_cls, medium_cls, large_cls) AS any_cls,
IS_GOOD(small_cls, medium_cls, large_cls) AS good_cls,
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp) AS any_lcp,
IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AS good_lcp,
(IS_GOOD(fast_fid, avg_fid, slow_fid) OR fast_fid IS NULL) AND
IS_GOOD(small_cls, medium_cls, large_cls) AND
IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AS good_cwv
FROM
`chrome-ux-report.materialized.device_summary`
WHERE
date = '2021-07-01' AND
device IN ('desktop', 'phone')
) JOIN (
SELECT DISTINCT
CAST('2021-07-01' AS DATE) AS date,
category,
app,
_TABLE_SUFFIX AS client,
url
FROM
`httparchive.technologies.2021_07_01_*`
WHERE
app IS NOT NULL AND
app != ''
) USING (date, url, client)
GROUP BY
date,
app,
client
FYI latest SQL is actually here: https://github.com/HTTPArchive/cwv-tech-report/blob/main/sql/monthly.sql
Resolved in https://github.com/HTTPArchive/dataform/pull/3