lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Unexpected changes in main thread time reported around March 8th

Open AntonioGargaro opened this issue 3 months ago • 10 comments

Summary

Hey all, I have been investigating the numbers reported by Pagespeed Insights API (therefore Lighthouse) for my company's SDK. We can see a massive spike in the metric from a third-party website that lets us understand changes over time to metrics such as main thread time.

Based on the below, has a change been released to PageSpeed Insights that could explain these differences? I believe the v11.6.0 would have been deployed around that date, so could this contribute to these changes in reporting?

I appreciate Econify is a third-party platform, but we have confirmed that they use PageSpeed Insights API under the hood.

Screenshot 2024-03-27 at 10 04 39 https://www.econify.com/performance/vendor/permutive-app

Interestingly enough, the main thread time varies substantially between a few other providers too, where some improve drastically and others appear to worsen.

Screenshot 2024-03-27 at 10 09 37 https://www.econify.com/performance/vendor/contextweb?date=Mar+26%2C+2024&device=mobile&type=article&range=1m

Screenshot 2024-03-27 at 10 13 04 https://www.econify.com/performance/vendor/google-analytics

More vendors:

  • https://www.econify.com/performance/vendor/appnexus
  • https://www.econify.com/performance/vendor/the-trade-desk
  • https://www.econify.com/performance/vendor/sharethrough
  • https://www.econify.com/performance/vendor/smart-adserver
  • https://www.econify.com/performance/vendor/openx
  • https://www.econify.com/performance/vendor/speedcurve-rum
  • https://www.econify.com/performance/vendor/triplelift

AntonioGargaro avatar Mar 27 '24 10:03 AntonioGargaro

Following up a little more on this investigation, I have profiled https://nypost.com with Lighthouse, where the report seems to attribute a longest-running task to our script which doesn't seem to make sense to me. I have uploaded the assets here for local inspection. I have added a video to the drive too running through what I'm seeing that isn't making sense.

From a commercial perspective at Permutive, this is causing upset with our customers around the performance of our script, which we haven't been able to identify yet internally with our metrics or profiling of publisher sites. I hope we can identify a change somewhere that may be affecting the attribution of blocking time to our script erroneously, or at the very least, validate how to inspect these profiles and reports correctly!

AntonioGargaro avatar Mar 27 '24 14:03 AntonioGargaro

Noting that this Chromium report also seems to be describing stacked tasks when were not present before Chromium v122.

https://issues.chromium.org/issues/329678173

A theory based on this is that other scripts' evaluation time is being attributed to our SDK instead of their own evaluation, which may explain why we see such a drop in main-thread time for them and an increase in ours.

AntonioGargaro avatar Mar 27 '24 15:03 AntonioGargaro

We have run experiments behind the scenes at Calibre and were able to observe Total Blocking Time (TBT) increases between Chrome versions. It seems that Lighthouse with Chrome 122 (& 123) reliably reports higher TBT than previously observed on 120 or 121.

Here’s what we saw:

  • TBT increase not present on Chrome 120
  • TBT increase not present on Chrome 121
  • TBT increase present on Chrome 122
  • TBT increase present on Chrome 123 (currently unreleased, tested in our pre-release environment)
  • Chrome 122 & Chrome 123 TBT measurements are consistent with each other

benschwarz avatar Mar 27 '24 23:03 benschwarz

Looking into this. I don't think we updated the Lighthouse version around March 8 (PSI is currently on 11.5.0). So I think it's more likely to be a performance regression in Chrome as @benschwarz's investigation seems to indicate.

I'm going to try bisecting this issue in Chrome. If ya'll could provide several specific URLs that showed a clear performance regression that would be super helpful in investigating this problem further.

adamraine avatar Mar 28 '24 00:03 adamraine

@adamraine Looking at our historic metrics, I didn't see any notable change from LH 11.4.0 to 11.6.0, the change appeared to be purely Chrome based.

benschwarz avatar Mar 28 '24 00:03 benschwarz

@adamraine The reference URL we have been using for NY Post is https://nypost.com/2017/05/10/walt-disneys-original-disneyland-map-could-sell-for-1m/ which is the same URL Econify is reporting increases on.

We also noticed this jump in TBT in Calibre for https://www.businessinsider.com.

Screenshot 2024-03-28 at 09 41 59

I believe this TBT is likely caused by the regression in Chrome, where it is nesting macrotasks under other macrotasks. This is likely why TBT is the obvious increase as what were small tasks are becoming long tasks.

AntonioGargaro avatar Mar 28 '24 09:03 AntonioGargaro

Hi folks, thanks for your efforts on this.

I just want to reiterate Toni's point that the root cause of this issue appears to be the following regression in Chromium: https://issues.chromium.org/issues/329678173.

Furthermore, the Chromium bug appears to have been incorrectly triaged as not being a regression, which appears to have lessened its priority. If others agree, then perhaps some further encouragement on the Chromium bug that this is a regression would be valuable.

In terms of replication and bisecting the issue in Chrome; I think the thing to look for is stacked macrotasks in the Chrome performance profiler (i.e. multiple concurrent grey rows), examples of which can be seen in the OP of the Chromium bug. I suspect that any build exhibiting stacked macrotasks in the Chrome performance profiler will exhibit the spurious main thread measurements.

joshdifabio avatar Apr 02 '24 10:04 joshdifabio

Furthermore, the Chromium bug appears to have been incorrectly triaged as not being a regression, which appears to have lessened its priority. If others agree, then perhaps some further encouragement on the Chromium bug that this is a regression would be valuable.

Yes, I agree. Having spent several days investigating on my side, I believe the Chromium bug to be a clear-cut regression. In testing before the issue (Chrome 120, 121) and after (Chrome 122, 123) we've seen a clear rise of TBT measurement (and importantly, not TTI). Tasks are up to 2X longer in a lot of the cases I've observed, which aligns with the report of stacked micro tasks.

I've shared some of my findings with the Lighthouse team privately and have also posted on the Chromium issue.

benschwarz avatar Apr 04 '24 06:04 benschwarz

Hey @adamraine, I've noticed the fixed has been released. Do you know when this will make it into Pagespeed Insights?

AntonioGargaro avatar May 02 '24 13:05 AntonioGargaro

12.0 should be in PSI sometime early next week.

connorjclark avatar May 02 '24 19:05 connorjclark