almanac.httparchive.org
almanac.httparchive.org copied to clipboard
Jamstack 2022 Queries
Progress on #2898
- Jamstack sites are fast
- [ ] Lighthouse Performance score]
- Have distribution of Lighthouse performance scores; median score is 0.31
- [ ] Largest Contentful Paint
- Have distribution of LCP times; median is 6.6 seconds
- Jamstack sites are resilient (and pre-rendered)
- [ ] Rendered content sizes (reuse SEO chapter query https://github.com/HTTPArchive/custom-metrics/blob/main/dist/wpt_bodies.js)
- [ ] How much the content changes post-load, i.e. Cumulative Layout Shift
- Have distribution of CLS scores; median is 0.059
- Jamstack sites are cached for a long time
- [ ] Age header
- Have distribution of Age headers; median is 1 day
- The Jamstack category is growing
- [ ] Percentage of "new" sites in crawl that meet the defined criteria of "fast", "resilient" and "cached"
- Candidate sites
- Have a join between all URLs better than median on all 4 current queries; this matches 10k sites.
@seldo @GregBrimble any progress on this? Getting quick concerned with the Jamstack chapter falling behind. Let me know if there’s anything I can do to help.
Sorry Barry, this was done as far as I'm concerned, and simply not merged -- my plan next week was to actually run these queries against some previous years. Is there more you'd expect prior to merging this?
Ah that’s great news.
If you could add the queries to this branch when you get a chance, we’ll give them a quick code review first, to save you having to rerun if we spot any problems.
Sure thing. I'm assuming what we should be doing is adding SQL files to sql/2022/jamstack along the lines of https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2021/javascript/frameworks_libraries.sql or is there a more sophisticated way to do this? (And is it okay if that's next week?)
Yup that’s it. Next week is more than fine now I know you’ve actually worked/are working on them. Was just a bit worried it was all too quiet and worried nothing was happening.
So next milestone is 1st August to have them added, reviewed, merged and run. And then comes the writing about all that lovely data!
Okay, I've done the initial queries and created a combined query that allows us to find all the URLs that match our various criteria and then a join that finds all the sites that match all of them. I've been inspecting the resulting URLs and they certainly all seem very jamstacky! There are quite a small number so I imagine picking "must exceed median" is quite stringent (as it excludes anybody at, or close to the median on one or more metrics).
Next steps:
- Examine the matching URLs for false positives (expecting very few)
- Try relaxing one or more parameters to include URLs close to the thresholds and see when we start getting a lot of false positives
- Having picked thresholds we like, do a longitudinal pass across previous years to see if this group is growing or shrinking and how fast
Hey folks, sorry, I've been pretty preoccupied with other stuff this year, and @seldo has already done a great job with the queries so far. Looks like it's pretty much just going to be relatively subjective tweaking of thresholds now. If it's alright, I'll bow out and let you finish them off. Sorry again, and good luck with the chapter!
@seldo any update on this? We're now nearly a week past when analysis was due so a little concerned that we're not going to make the publish date...