almanac.httparchive.org icon indicating copy to clipboard operation
almanac.httparchive.org copied to clipboard

Capabilities Queries for 2024

Open MichaelSolati opened this issue 1 year ago • 3 comments

I hope this looks right. @tunetheweb I remember you did the 2022, so I basically copied your PR. 🫥

MichaelSolati avatar Aug 21 '24 21:08 MichaelSolati

Actually, we're looking to convert these queries to the new data model.

I've done that for you if you could add to the branch? Here's the first query:

#standardSQL
CREATE TEMP FUNCTION getFuguAPIs(data STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
const $ = JSON.parse(data);
return Object.keys($);
''';

WITH fuguapis AS (
  SELECT
    date,
    client,
    page,
    fuguAPI
  FROM
    `httparchive.all.pages`,
    UNNEST(getFuguAPIs(JSON_QUERY(custom_metrics, '$."fugu-apis"'))) AS fuguAPI
  WHERE
    date = '2024-06-01' AND
    JSON_QUERY(custom_metrics, '$."fugu-apis"') != '[]'
),

totals AS (
  SELECT
    date,
    client,
    COUNT(0) AS total
  FROM
    `httparchive.all.pages`
  WHERE
    date = '2024-06-01'
  GROUP BY
    date,
    client
)

SELECT
  client,
  fuguAPI,
  COUNT(DISTINCT page) AS pages,
  total,
  COUNT(DISTINCT page) / total AS pct,
  ARRAY_TO_STRING(ARRAY_AGG(DISTINCT page LIMIT 50), ' ') AS sample_urls
FROM
  fuguapis
JOIN
  totals
USING
  (client, date)
GROUP BY
  fuguAPI,
  client,
  total
HAVING
  COUNT(DISTINCT page) >= 10
ORDER BY
  pct DESC,
  client;

tunetheweb avatar Aug 21 '24 22:08 tunetheweb

And here's the second:

#standardSQL
CREATE TEMP FUNCTION getFuguAPIs(data STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
const $ = JSON.parse(data);
return Object.keys($);
''';

SELECT
  client,
  page,
  COUNT(DISTINCT fuguAPI) AS fuguAPIs
  FROM
    `httparchive.all.pages`,
    UNNEST(getFuguAPIs(JSON_QUERY(custom_metrics, '$."fugu-apis"'))) AS fuguAPI
WHERE
    date = '2024-06-01' AND
    JSON_QUERY(custom_metrics, '$."fugu-apis"') != '[]'
GROUP BY
  client,
  page
HAVING
  COUNT(DISTINCT fuguAPI) >= 1
ORDER BY
  fuguAPIs DESC,
  page,
  client
LIMIT 100;

tunetheweb avatar Aug 21 '24 22:08 tunetheweb

Bahh, thank you. I'm away from my computer but will update the PRs tonight (I have one for PWAs too)

MichaelSolati avatar Aug 22 '24 01:08 MichaelSolati

@tunetheweb can this land?

MichaelSolati avatar Sep 02 '24 20:09 MichaelSolati

I made some tweaks to handle secondary pages better in https://github.com/HTTPArchive/almanac.httparchive.org/pull/3747/commits/af0f96a1bb77926f1bab338bccb100dd9d704f04

tunetheweb avatar Sep 05 '24 12:09 tunetheweb

Also added the results to the sheet: https://docs.google.com/spreadsheets/d/1Ig-821tyjr897i8QqPvXiRMY9o444qsFAmZt4AFyBjk/edit?gid=931311410#gid=931311410

tunetheweb avatar Sep 05 '24 13:09 tunetheweb

Thank you @tunetheweb. I can work on the draft now 🥳

MichaelSolati avatar Sep 05 '24 21:09 MichaelSolati