almanac.httparchive.org Jamstack 2022 Queries

trafficstars

Progress on #2898

Jamstack sites are fast

[ ] Lighthouse Performance score]
- Have distribution of Lighthouse performance scores; median score is 0.31
[ ] Largest Contentful Paint
- Have distribution of LCP times; median is 6.6 seconds

Jamstack sites are resilient (and pre-rendered)

[ ] Rendered content sizes (reuse SEO chapter query https://github.com/HTTPArchive/custom-metrics/blob/main/dist/wpt_bodies.js)
[ ] How much the content changes post-load, i.e. Cumulative Layout Shift
- Have distribution of CLS scores; median is 0.059

Jamstack sites are cached for a long time

[ ] Age header
- Have distribution of Age headers; median is 1 day

The Jamstack category is growing

[ ] Percentage of "new" sites in crawl that meet the defined criteria of "fast", "resilient" and "cached"

Candidate sites

Have a join between all URLs better than median on all 4 current queries; this matches 10k sites.

Jun 27 '22 09:06 whitep4nth3r

@seldo @GregBrimble any progress on this? Getting quick concerned with the Jamstack chapter falling behind. Let me know if there’s anything I can do to help.

Jul 13 '22 20:07 tunetheweb

Sorry Barry, this was done as far as I'm concerned, and simply not merged -- my plan next week was to actually run these queries against some previous years. Is there more you'd expect prior to merging this?

Jul 13 '22 21:07 seldo

Ah that’s great news.

If you could add the queries to this branch when you get a chance, we’ll give them a quick code review first, to save you having to rerun if we spot any problems.

Jul 13 '22 22:07 tunetheweb

Sure thing. I'm assuming what we should be doing is adding SQL files to sql/2022/jamstack along the lines of https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2021/javascript/frameworks_libraries.sql or is there a more sophisticated way to do this? (And is it okay if that's next week?)

Jul 13 '22 22:07 seldo

Yup that’s it. Next week is more than fine now I know you’ve actually worked/are working on them. Was just a bit worried it was all too quiet and worried nothing was happening.

So next milestone is 1st August to have them added, reviewed, merged and run. And then comes the writing about all that lovely data!

Jul 13 '22 22:07 tunetheweb

Okay, I've done the initial queries and created a combined query that allows us to find all the URLs that match our various criteria and then a join that finds all the sites that match all of them. I've been inspecting the resulting URLs and they certainly all seem very jamstacky! There are quite a small number so I imagine picking "must exceed median" is quite stringent (as it excludes anybody at, or close to the median on one or more metrics).

Next steps:

Examine the matching URLs for false positives (expecting very few)
Try relaxing one or more parameters to include URLs close to the thresholds and see when we start getting a lot of false positives
Having picked thresholds we like, do a longitudinal pass across previous years to see if this group is growing or shrinking and how fast

Jul 21 '22 21:07 seldo

Hey folks, sorry, I've been pretty preoccupied with other stuff this year, and @seldo has already done a great job with the queries so far. Looks like it's pretty much just going to be relatively subjective tweaking of thresholds now. If it's alright, I'll bow out and let you finish them off. Sorry again, and good luck with the chapter!

Jul 24 '22 19:07 GregBrimble

@seldo any update on this? We're now nearly a week past when analysis was due so a little concerned that we're not going to make the publish date...

Aug 07 '22 14:08 tunetheweb

almanac.httparchive.org almanac.httparchive.org copied to clipboard

Jamstack 2022 Queries

almanac.httparchive.org
almanac.httparchive.org copied to clipboard