almanac.httparchive.org icon indicating copy to clipboard operation
almanac.httparchive.org copied to clipboard

Jamstack 2022 Queries

Open whitep4nth3r opened this issue 3 years ago • 8 comments
trafficstars

Progress on #2898

  1. Jamstack sites are fast
  • [ ] Lighthouse Performance score]
    • Have distribution of Lighthouse performance scores; median score is 0.31
  • [ ] Largest Contentful Paint
    • Have distribution of LCP times; median is 6.6 seconds
  1. Jamstack sites are resilient (and pre-rendered)
  • [ ] Rendered content sizes (reuse SEO chapter query https://github.com/HTTPArchive/custom-metrics/blob/main/dist/wpt_bodies.js)
  • [ ] How much the content changes post-load, i.e. Cumulative Layout Shift
    • Have distribution of CLS scores; median is 0.059
  1. Jamstack sites are cached for a long time
  • [ ] Age header
    • Have distribution of Age headers; median is 1 day
  1. The Jamstack category is growing
  • [ ] Percentage of "new" sites in crawl that meet the defined criteria of "fast", "resilient" and "cached"
  1. Candidate sites
  • Have a join between all URLs better than median on all 4 current queries; this matches 10k sites.

whitep4nth3r avatar Jun 27 '22 09:06 whitep4nth3r

@seldo @GregBrimble any progress on this? Getting quick concerned with the Jamstack chapter falling behind. Let me know if there’s anything I can do to help.

tunetheweb avatar Jul 13 '22 20:07 tunetheweb

Sorry Barry, this was done as far as I'm concerned, and simply not merged -- my plan next week was to actually run these queries against some previous years. Is there more you'd expect prior to merging this?

seldo avatar Jul 13 '22 21:07 seldo

Ah that’s great news.

If you could add the queries to this branch when you get a chance, we’ll give them a quick code review first, to save you having to rerun if we spot any problems.

tunetheweb avatar Jul 13 '22 22:07 tunetheweb

Sure thing. I'm assuming what we should be doing is adding SQL files to sql/2022/jamstack along the lines of https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2021/javascript/frameworks_libraries.sql or is there a more sophisticated way to do this? (And is it okay if that's next week?)

seldo avatar Jul 13 '22 22:07 seldo

Yup that’s it. Next week is more than fine now I know you’ve actually worked/are working on them. Was just a bit worried it was all too quiet and worried nothing was happening.

So next milestone is 1st August to have them added, reviewed, merged and run. And then comes the writing about all that lovely data!

tunetheweb avatar Jul 13 '22 22:07 tunetheweb

Okay, I've done the initial queries and created a combined query that allows us to find all the URLs that match our various criteria and then a join that finds all the sites that match all of them. I've been inspecting the resulting URLs and they certainly all seem very jamstacky! There are quite a small number so I imagine picking "must exceed median" is quite stringent (as it excludes anybody at, or close to the median on one or more metrics).

Next steps:

  • Examine the matching URLs for false positives (expecting very few)
  • Try relaxing one or more parameters to include URLs close to the thresholds and see when we start getting a lot of false positives
  • Having picked thresholds we like, do a longitudinal pass across previous years to see if this group is growing or shrinking and how fast

seldo avatar Jul 21 '22 21:07 seldo

Hey folks, sorry, I've been pretty preoccupied with other stuff this year, and @seldo has already done a great job with the queries so far. Looks like it's pretty much just going to be relatively subjective tweaking of thresholds now. If it's alright, I'll bow out and let you finish them off. Sorry again, and good luck with the chapter!

GregBrimble avatar Jul 24 '22 19:07 GregBrimble

@seldo any update on this? We're now nearly a week past when analysis was due so a little concerned that we're not going to make the publish date...

tunetheweb avatar Aug 07 '22 14:08 tunetheweb