almanac.httparchive.org
almanac.httparchive.org copied to clipboard
Sustainability 2022 Queries
Progress on #2910
Contents of PR are duplicated from the Google doc outline
Hosting
- [x] % of “green hosted” sites
- [ ] CDN usage
General
- [x] Co2e per page load (Page weight)
- [x] Request distribution
- [x] Requests by type
Cache
- [ ] Cache adoption
- [ ] Caching by resource type
Image Optimization
- [x] Lazy loading
- [ ] Native lazy loading v. JS implementation
- [x] Adoption of formats
- [ ] Image quality
- [ ] Image size
JS & CSS
- [x] Compression
- [x] Minification
- [ ] Unused Code
- [ ] Inline v. external
Fonts
- [x] Requests per page
- [ ] Format adoption
- [ ] Unused font requests
Video
- [ ] Preload
- [ ] Autoplay
Animation
- [ ] CSS v. JS animation
- [ ] CPU usage by animations
Third Parties
- [x] Green hosting
- [ ] Co2e from third parties
- [ ] Co2e by third-party category
Dark Mode
- [ ]
Embedded content
- [ ] Transfer size of embedded content
- [ ] TBT of embedded content
IFrames
- [ ] How many per page?
- [ ] Weight of iFrame content
- [ ] Adoption of native lazy loading for iFrames
Resource Hints
- [ ] Preload
- [ ] Prefetch
Platform Summary
- [x] CMS
- [x] eCommerce
- [x] Jamstack
Performance & Sustainability
- [ ] Page Co2e & FCP/LCP
@tunetheweb could we find some time later this week to look at https://github.com/HTTPArchive/almanac.httparchive.org/pull/2989/commits/42714b0d44ae0cb59d97a339dd2a5fa875f9104b#diff-e62b9f849c03e2bbbcb42e98e4ecdfc786dea8b2ce842a87892bb286604dacd5
The query we've got currently gives a total percentage, but I'd also like to break it down by % of top 1000, 10000, 100000 sites.
Details here: https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide#rank
So this works:
#standardSQL
# What percentage of URLs are hosted on a known green web hosting provider?
WITH green AS (
SELECT
NET.HOST(url) AS host,
TRUE AS is_green
FROM
`httparchive.almanac.green_web_foundation`
WHERE
date = '2022-06-01'
),
pages AS (
SELECT
_TABLE_SUFFIX AS client,
NET.HOST(url) AS host,
rank
FROM
`httparchive.summary_pages.2022_06_01_*`
)
SELECT
client,
rank_grouping,
COUNTIF(is_green) AS total_green,
COUNT(0) AS total_sites,
COUNTIF(is_green) / COUNT(0) AS pct_green
FROM
pages
LEFT JOIN
green
USING
(host),
UNNEST([1000, 10000, 100000, 1000000, 10000000]) AS rank_grouping
WHERE
rank <= rank_grouping
GROUP BY
client,
rank_grouping
ORDER BY
client,
rank_grouping
Hi @camcash17 & @4upz
Here are the queries that are still required (or topics that still require queries). For each I have linked to the section of the Outline that mentions them so you can have some background (if available) on what the author wants to cover.
- [x] Unused CSS & JS
You should be able to find the query for these in the notes further down the document.
I don't think we could find a previous query to reference for this, though you can check again. It might be one we need to write from scratch.
This is a new query as well. It could be worth checking with Laurent if we really need a figure for this.
You should be able to find the query for these in the notes further down the document.
- [x] Inline JS & CSS
I don't think we could find a previous query to reference for this, though you can check again. It might be one we need to write from scratch.
- [x] CDN usage
You should be able to find the query for these in the notes further down the document.
- [x] Caching
You should be able to find the query for these in the notes further down the document.
- [ ] Web performance
For this, we can look to add the overall page Lighthouse Performance Score (mobile & desktop) to the current page_bytes_per_type.sql query.
@camcash17 looks like there are some new linting errors, can you take a look?
What's the latest on this? Are we ready for another review or is it still being worked on?
I've still some open comments on this PR, but think most of the queries are in a reasonably fit state. I would suggest starting to run them, and save the data to the sheet, so you can see what the data looks like, while also addressing the comments I've made. I'm a little uncertain with exactly what we hope to get our of some of the queries, but maybe once we see the data it will make more sense (or you'll all see the query perhaps doesn't make as much sense as you thought).
We'll still need this PR reviewed and merged, but as long as most of the queries are OK, it might just need a few rerunning once the review has identified corrections. You also may find some need slight tweaks as you run them.
I've nuked the Green Third Parties query since it was returning some pretty strange results. I've rewritten it (f3dffe7) to produce results that look more right and give us something to talk to.
@tunetheweb I've updated the checklist at the top. We've got one query on Font format adoption that has been missed, but I'm guessing the Fonts chapter has data on this that we can use.