lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

`lantern-idle-callback-short` is flaky

Open adamraine opened this issue 3 years ago • 12 comments
trafficstars

Been seeing this a lot

https://github.com/GoogleChrome/lighthouse/runs/7755431040?check_suite_focus=true

adamraine avatar Aug 09 '22 21:08 adamraine

Also legacy nav: https://github.com/GoogleChrome/lighthouse/runs/7755430152?check_suite_focus=true

adamraine avatar Aug 09 '22 21:08 adamraine

The extra blocking time appears to come from an event like this

{
  "args": {},
  "cat": "disabled-by-default-devtools.timeline",
  "dur": 77866,
  "name": "EvaluateScript",
  "ph": "X",
  "pid": 3620,
  "tdur": 50575,
  "tid": 1,
  "ts": 450686353,
  "tts": 1161020
},

Doing some digging where this event is emitted in v8, it looks like it's from the poller script injected by Puppeteer via Runtime.evaluate. Unfortunately, there isn't anything to identify this script in the trace event so we can't easily ignore it.

Would it be ok to ignore any EvaluateScript events emitted on category disabled-by-default-devtools.timeline?

adamraine avatar Sep 19 '22 21:09 adamraine

Puppeteer via Runtime.evaluate

I think this is just and CDP Runtime.evaluate fwiw, so applies to legacy runner / all our page function code.

Even tho legacy happens less often, it could be good to figure out what code we're running in the page context that emits these long tasks.

for FR, we can either find out how to disable the culprit or ignore those tasks in various places in LH (seems like it could be hard to get right)

connorjclark avatar Sep 19 '22 22:09 connorjclark

I was doing some testing with the latest version of Puppeteer (18.x.x) and it seems like the long EvaluateScript from the injected code isn't super long anymore. I guessed https://github.com/puppeteer/puppeteer/pull/8946 may have fixed it but not certain. See https://github.com/GoogleChrome/lighthouse/pull/14385 which was failing lantern-idle-callback-short most of the time but stopped failing once Puppeteer was bumped to 18.x.x.

We should still emit a warning if Runtime.evaluate calls block the main thread too much. I will continue to investigate the impact of Puppeteer 18.x.x.

adamraine avatar Sep 19 '22 23:09 adamraine

A bisect supports https://github.com/puppeteer/puppeteer/pull/8946 as the fix for this issue. Puppeteer util injection no longer returns by value on the protocol.

adamraine avatar Sep 21 '22 20:09 adamraine

So upgrading Puppeteer in #14385 did help for the reasons I mentioned above, but we are still seeing failures in the DevTools smoke runner sometimes.

Example https://github.com/GoogleChrome/lighthouse/actions/runs/3185898767/jobs/5196014984

adamraine avatar Oct 04 '22 22:10 adamraine

I did a little investigation, the new failures seem to be from the idle callbacks taking too long on the main thread:

Screen Shot 2022-10-05 at 4 40 44 PM

Unfortunately, I still can't get a local repro. Current theory is that the extra setup for DT smokes (loading the page initially, opening DT panel, etc.) uses too much processing power and slows down the idle callback tasks.

adamraine avatar Oct 05 '22 23:10 adamraine

Gonna increase DT retries

adamraine avatar Oct 18 '22 19:10 adamraine

lantern-idle-callback-short is testing ric-shim?short which.. is a fairly awkward situation that's arguably an edge case.

We can improve it to be a bit more true to rIC usage in the wild or remove it. As of now, it's understandable this is flaky.

paulirish avatar Mar 07 '23 20:03 paulirish

according to https://chromestatus.com/metrics/feature/timeline/popularity/954, 30-40% of page loads include a requestIdleCallback call. If we continue to shim it (which currently seems like the only option if using simulated throttling), it does feel like we should have enough confidence in it that this test can reliably run, whether that requires a change in the shim or in establishing expected variance and adding that to the test expectations.

brendankenny avatar Mar 07 '23 20:03 brendankenny

I don't think the variance in the test case comes from our requestIdleCallback shim. When running the test locally it passes 100% of the time and fails 100% of the time if I remove our shim.

I believe the variance comes from CPU slowdown on CI bots that causes tasks to randomly take longer than normal. I think we need a different expectation than TBT < 100ms which can reliably verify that our shim is working properly, while allowing more variance in blocking time that is caused by CI bots being slow.

adamraine avatar Mar 07 '23 20:03 adamraine

Gonna try reproducing in PSI and/or on a page with lots of oopifs

adamraine avatar Mar 21 '23 19:03 adamraine