almanac.httparchive.org icon indicating copy to clipboard operation
almanac.httparchive.org copied to clipboard

Sustainability 2022 Queries

Open camcash17 opened this issue 3 years ago • 5 comments

Progress on #2910

Contents of PR are duplicated from the Google doc outline

Hosting

  • [x] % of “green hosted” sites
  • [ ] CDN usage

General

  • [x] Co2e per page load (Page weight)
  • [x] Request distribution
  • [x] Requests by type

Cache

  • [ ] Cache adoption
  • [ ] Caching by resource type

Image Optimization

  • [x] Lazy loading
  • [ ] Native lazy loading v. JS implementation
  • [x] Adoption of formats
  • [ ] Image quality
  • [ ] Image size

JS & CSS

  • [x] Compression
  • [x] Minification
  • [ ] Unused Code
  • [ ] Inline v. external

Fonts

  • [x] Requests per page
  • [ ] Format adoption
  • [ ] Unused font requests

Video

  • [ ] Preload
  • [ ] Autoplay

Animation

  • [ ] CSS v. JS animation
  • [ ] CPU usage by animations

Third Parties

  • [x] Green hosting
  • [ ] Co2e from third parties
  • [ ] Co2e by third-party category

Dark Mode

  • [ ]

Embedded content

  • [ ] Transfer size of embedded content
  • [ ] TBT of embedded content

IFrames

  • [ ] How many per page?
  • [ ] Weight of iFrame content
  • [ ] Adoption of native lazy loading for iFrames

Resource Hints

  • [ ] Preload
  • [ ] Prefetch

Platform Summary

  • [x] CMS
  • [x] eCommerce
  • [x] Jamstack

Performance & Sustainability

  • [ ] Page Co2e & FCP/LCP

camcash17 avatar Jun 23 '22 00:06 camcash17

@tunetheweb could we find some time later this week to look at https://github.com/HTTPArchive/almanac.httparchive.org/pull/2989/commits/42714b0d44ae0cb59d97a339dd2a5fa875f9104b#diff-e62b9f849c03e2bbbcb42e98e4ecdfc786dea8b2ce842a87892bb286604dacd5

The query we've got currently gives a total percentage, but I'd also like to break it down by % of top 1000, 10000, 100000 sites.

fershad avatar Jul 02 '22 03:07 fershad

Details here: https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide#rank

So this works:

#standardSQL
# What percentage of URLs are hosted on a known green web hosting provider?

WITH green AS (
  SELECT
    NET.HOST(url) AS host,
    TRUE AS is_green
  FROM
    `httparchive.almanac.green_web_foundation`
  WHERE
    date = '2022-06-01'
),

pages AS (
  SELECT
    _TABLE_SUFFIX AS client,
    NET.HOST(url) AS host,
    rank
  FROM
    `httparchive.summary_pages.2022_06_01_*`
)

SELECT
  client,
  rank_grouping,
  COUNTIF(is_green) AS total_green,
  COUNT(0) AS total_sites,
  COUNTIF(is_green) / COUNT(0) AS pct_green
FROM
  pages
LEFT JOIN
  green
USING
  (host),
UNNEST([1000, 10000, 100000, 1000000, 10000000]) AS rank_grouping
WHERE
  rank <= rank_grouping
GROUP BY
  client,
  rank_grouping
ORDER BY
  client,
  rank_grouping

tunetheweb avatar Jul 02 '22 07:07 tunetheweb

Hi @camcash17 & @4upz

Here are the queries that are still required (or topics that still require queries). For each I have linked to the section of the Outline that mentions them so you can have some background (if available) on what the author wants to cover.

You should be able to find the query for these in the notes further down the document.

I don't think we could find a previous query to reference for this, though you can check again. It might be one we need to write from scratch.

This is a new query as well. It could be worth checking with Laurent if we really need a figure for this.

You should be able to find the query for these in the notes further down the document.

I don't think we could find a previous query to reference for this, though you can check again. It might be one we need to write from scratch.

You should be able to find the query for these in the notes further down the document.

You should be able to find the query for these in the notes further down the document.

For this, we can look to add the overall page Lighthouse Performance Score (mobile & desktop) to the current page_bytes_per_type.sql query.

fershad avatar Jul 08 '22 09:07 fershad

@camcash17 looks like there are some new linting errors, can you take a look?

rviscomi avatar Jul 29 '22 16:07 rviscomi

What's the latest on this? Are we ready for another review or is it still being worked on?

tunetheweb avatar Aug 08 '22 19:08 tunetheweb

I've still some open comments on this PR, but think most of the queries are in a reasonably fit state. I would suggest starting to run them, and save the data to the sheet, so you can see what the data looks like, while also addressing the comments I've made. I'm a little uncertain with exactly what we hope to get our of some of the queries, but maybe once we see the data it will make more sense (or you'll all see the query perhaps doesn't make as much sense as you thought).

We'll still need this PR reviewed and merged, but as long as most of the queries are OK, it might just need a few rerunning once the review has identified corrections. You also may find some need slight tweaks as you run them.

tunetheweb avatar Aug 14 '22 17:08 tunetheweb

I've nuked the Green Third Parties query since it was returning some pretty strange results. I've rewritten it (f3dffe7) to produce results that look more right and give us something to talk to.

fershad avatar Aug 17 '22 08:08 fershad

@tunetheweb I've updated the checklist at the top. We've got one query on Font format adoption that has been missed, but I'm guessing the Fonts chapter has data on this that we can use.

fershad avatar Aug 19 '22 13:08 fershad