Capabilities Queries for 2024
I hope this looks right. @tunetheweb I remember you did the 2022, so I basically copied your PR. 🫥
Actually, we're looking to convert these queries to the new data model.
I've done that for you if you could add to the branch? Here's the first query:
#standardSQL
CREATE TEMP FUNCTION getFuguAPIs(data STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
const $ = JSON.parse(data);
return Object.keys($);
''';
WITH fuguapis AS (
SELECT
date,
client,
page,
fuguAPI
FROM
`httparchive.all.pages`,
UNNEST(getFuguAPIs(JSON_QUERY(custom_metrics, '$."fugu-apis"'))) AS fuguAPI
WHERE
date = '2024-06-01' AND
JSON_QUERY(custom_metrics, '$."fugu-apis"') != '[]'
),
totals AS (
SELECT
date,
client,
COUNT(0) AS total
FROM
`httparchive.all.pages`
WHERE
date = '2024-06-01'
GROUP BY
date,
client
)
SELECT
client,
fuguAPI,
COUNT(DISTINCT page) AS pages,
total,
COUNT(DISTINCT page) / total AS pct,
ARRAY_TO_STRING(ARRAY_AGG(DISTINCT page LIMIT 50), ' ') AS sample_urls
FROM
fuguapis
JOIN
totals
USING
(client, date)
GROUP BY
fuguAPI,
client,
total
HAVING
COUNT(DISTINCT page) >= 10
ORDER BY
pct DESC,
client;
And here's the second:
#standardSQL
CREATE TEMP FUNCTION getFuguAPIs(data STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
const $ = JSON.parse(data);
return Object.keys($);
''';
SELECT
client,
page,
COUNT(DISTINCT fuguAPI) AS fuguAPIs
FROM
`httparchive.all.pages`,
UNNEST(getFuguAPIs(JSON_QUERY(custom_metrics, '$."fugu-apis"'))) AS fuguAPI
WHERE
date = '2024-06-01' AND
JSON_QUERY(custom_metrics, '$."fugu-apis"') != '[]'
GROUP BY
client,
page
HAVING
COUNT(DISTINCT fuguAPI) >= 1
ORDER BY
fuguAPIs DESC,
page,
client
LIMIT 100;
Bahh, thank you. I'm away from my computer but will update the PRs tonight (I have one for PWAs too)
@tunetheweb can this land?
I made some tweaks to handle secondary pages better in https://github.com/HTTPArchive/almanac.httparchive.org/pull/3747/commits/af0f96a1bb77926f1bab338bccb100dd9d704f04
Also added the results to the sheet: https://docs.google.com/spreadsheets/d/1Ig-821tyjr897i8QqPvXiRMY9o444qsFAmZt4AFyBjk/edit?gid=931311410#gid=931311410
Thank you @tunetheweb. I can work on the draft now 🥳