wpt icon indicating copy to clipboard operation
wpt copied to clipboard

Missing Chrome and Firefox stable results for July 19, 2022

Open jcscottiii opened this issue 3 years ago • 4 comments

Hi @foolip and @jgraham, I was checking the ecosystem dashboard and saw that the stable runs circle is red.

Would you happen to have any insight on why the Chrome and Firefox results did not come in for July 19?

image

Many thanks, James S.

jcscottiii avatar Jul 20 '22 15:07 jcscottiii

Results came in the following day

jcscottiii avatar Jul 29 '22 20:07 jcscottiii

@jcscottiii it looks like this has been happening with some regularity looking at https://wpt.fyi/runs?label=master&label=stable&max-count=100&product=chrome&product=firefox&product=safari

Here's how I'd investigate what went wrong. I'll use the runs missing in 6808a6b as the example, as that's more recent. First click "6808a6b" to get to this view: https://wpt.fyi/results/?sha=6808a6b426&label=master&label=stable&max-count=1&product=chrome&product=firefox&product=safari

Then, clicking "6808a6b" under Safari will get you to GitHub: https://github.com/web-platform-tests/wpt/commit/6808a6b426

There's a red x next to "Fix expectations for contain-intrinsic-size-028.html" that you can click to expand which checks passed and failed. Chrome and Firefox are run on Taskcluster, so click the first failing Taskcluster check to get here: https://github.com/web-platform-tests/wpt/runs/7512548164

Clicking through some more gets us to the task group: https://community-tc.services.mozilla.com/tasks/groups/JqrwqVINQQyGb0F3opyVjw

But it looks like Chrome and Firefox all passed, right? The problem is then most likely with the wpt.fyi processor.

Maybe the processor requires all tasks to pass, so a failure of any task prevents processing of results. I don't think this is the case, but it would explain it.

Since this continues to happen, I'll reopen. @jcscottiii do you know where in GCP to find processor logs to dig into what went wrong?

foolip avatar Aug 01 '22 11:08 foolip

@foolip thanks so much for these steps. I can take a look into the processor's logs in GCP.

jcscottiii avatar Aug 01 '22 15:08 jcscottiii

Some raw notes from the investigation:

{
insertId: "14iltpfgnk6as"
logName: "projects/wptdashboard/logs/request_log_entries"
receiveTimestamp: "2022-07-26T05:05:49.245405560Z"
resource: {2}
severity: "ERROR"
textPayload: "Failed to fetch check runs for suite 7515393036: GET https://api.github.com/repos/web-platform-tests/wpt/check-suites/7515393036/check-runs?page=7&per_page=25: 502 Server Error []"
timestamp: "2022-07-26T05:05:48.234525276Z"
trace: "projects/wptdashboard/traces/ce97c386a47ff612fd5163f70a13d189"
}

{
insertId: "14iltpfgnk6at"
logName: "projects/wptdashboard/logs/request_log_entries"
receiveTimestamp: "2022-07-26T05:05:49.245405560Z"
resource: {2}
severity: "ERROR"
textPayload: "GET https://api.github.com/repos/web-platform-tests/wpt/check-suites/7515393036/check-runs?page=7&per_page=25: 502 Server Error []"
timestamp: "2022-07-26T05:05:48.234573772Z"
trace: "projects/wptdashboard/traces/ce97c386a47ff612fd5163f70a13d189"
}

That corresponds to this

https://github.com/web-platform-tests/wpt.fyi/blob/2bb8884902f15c98bbe4a431e6db22ad7eeb4159/api/taskcluster/webhook.go#L133-L137

	runs, err := api.ListCheckRuns(owner, repo, checkSuite.GetCheckSuite().GetID())
	if err != nil {
		log.Errorf("Failed to fetch check runs for suite %v: %s", checkSuite.GetCheckSuite().GetID(), err.Error())
		return EventInfo{}, err
	}

which is called from here:

		event, err = GetCheckSuiteEventInfo(checkSuite, log, api)
	}
	if err != nil {
		log.Errorf("%v", err)
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return

Also did an analysis for today since it seems missing too 6808a6b

same error:

{
insertId: "968z1tfcgdau0"
logName: "projects/wptdashboard/logs/request_log_entries"
receiveTimestamp: "2022-08-01T05:31:22.118640003Z"
resource: {2}
severity: "ERROR"
textPayload: "Failed to fetch check runs for suite 7600616016: GET https://api.github.com/repos/web-platform-tests/wpt/check-suites/7600616016/check-runs?page=4&per_page=25: 502 Server Error []"
timestamp: "2022-08-01T05:31:21.911628683Z"
trace: "projects/wptdashboard/traces/8bafec85f590002ba7f0e54037f32549"
}

{
insertId: "968z1tfcgdau1"
logName: "projects/wptdashboard/logs/request_log_entries"
receiveTimestamp: "2022-08-01T05:31:22.118640003Z"
resource: {2}
severity: "ERROR"
textPayload: "GET https://api.github.com/repos/web-platform-tests/wpt/check-suites/7600616016/check-runs?page=4&per_page=25: 502 Server Error []"
timestamp: "2022-08-01T05:31:21.911653794Z"
trace: "projects/wptdashboard/traces/8bafec85f590002ba7f0e54037f32549"
}

Initial Diagnosis

Started tracing to when the GitHub webhook is called for taskcluster.

Looks like there is an intermittent problem when calling the GitHub API. Still need to find out why it stops both browsers from uploading. It may be like you said @foolip . But need to confirm

jcscottiii avatar Aug 01 '22 20:08 jcscottiii