wpt-metadata Programmatic mapping of Chrome failures to existing monorail bugs

In https://github.com/web-platform-tests/wpt-metadata/issues/481 one of the programmatic imports we did was from a set of chrome specific failures on wpt.fyi, matching them against existing monorail bugs based on searching filenames.

It makes sense to do this for all Chrome failures, not just Chrome-specific failures, so let's do that! I'm going to use this issue to track it :).

Jan 30 '21 01:01 stephenmcgruer

Initial request none(triaged:chrome) chrome:!pass chrome:!ok:

curl 'https://wpt.fyi/api/search' \
  -H 'authority: wpt.fyi' \
  -H 'user-agent: Mozilla/5.0 (X11; CrOS x86_64 13505.111.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.152 Safari/537.36' \
  -H 'content-type: text/plain;charset=UTF-8' \
  -H 'accept: */*' \
  -H 'origin: https://wpt.fyi' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://wpt.fyi/results/?label=master&label=experimental&aligned&q=none%28triaged%3Achrome%29%20chrome%3A%21pass%20chrome%3A%21ok' \
  -H 'accept-language: en-US,en;q=0.9,en-CA;q=0.8' \
  -H 'cookie: _ga=GA1.2.1092915640.1598030877; session=MTYxMTI0MjQ5NnxaRmFBcXJ5enh4djdPZDdCTXpqT3FiMHJJZ0xHcHNEN25JQWlzRi0xSjlTRWUyMi1XQ2I5MG1vRWRDQmw5OFVfcmxsTHliaTVaYTBzcFE4NWZTWWc0NFRvMk9Qc0twY2Y0UXZaeldPWlBLT1ZkZThjVmlLUE1XWWk4M0wtRWlkdW43M2xubU92X0RUeGx3ZWpkN21PY0hJUlVKc3huaHlrNmQzSzJERGJHblRpSHA1VGhUY0hvck9CWXdfcTNpcXlBWW04Z2dOM3M1V2NDNmprUldPSjFkSFFCMG82UHJZbGdwcEdQRlFjdlhDa2tINFRpeWlweFpra1ZBWDV2WnBIRDlqaGhUYzZ4Wm5mc1E9PXw7TL71UAi2DVHJma8h7VTzNBJCJXlXxUaX3e1iIKYj3A==; _gid=GA1.2.1974819826.1611662905; _gat=1' \
  --data-binary '{"run_ids":[5734059885985792,4806348191563776,5731688392949760,5701194796236800],"query":{"and":[{"none":[{"triaged":"chrome"}]},{"exists":[{"product":"chrome","status":{"not":"PASS"}}]},{"exists":[{"product":"chrome","status":{"not":"OK"}}]}]}}' \
  --compressed

EDIT: In retrospect, I should also have specified chrome:!missing.

Jan 30 '21 01:01 stephenmcgruer

Flattened the results.json into a list of tests:

import json
with open('results.json', 'r') as f:
  results = json.load(f)

tests = results['results']
with open('tests.txt', 'w') as f:
  for test in tests:
    f.write(test['test'])
    f.write('\n')

And sorted it

sort -o tests.txt tests.txt

Jan 30 '21 01:01 stephenmcgruer

And then a blinkpy script to search monorail:

import sys

from blinkpy.w3c.monorail import MonorailAPI, MonorailIssue
from blinkpy.common.net.luci_auth import LuciAuth
from blinkpy.common.host import Host

import googleapiclient

host = Host()
token = LuciAuth(host).get_access_token()
api = MonorailAPI(access_token=token)

# A cache in case the runs break halfway. They did.
processed_tests = set()
with open('processed-tests.txt', 'r') as f:
    for line in f:
        line = line.strip()
        if 'ERRORED' in line:
            continue
        processed_tests.add(line.split(' ')[0])

with open('tests.txt', 'r') as f:
  tests = [line.strip() for line in f]

issues = api.api.issues()

def log(msg):
  print(msg)
  sys.stdout.flush()

log("Processing %s tests" % len(tests))
for test in tests:
    if test in processed_tests:
        continue
    try:
      resp = issues.list(projectId='chromium', q=test, can='open').execute()
      bug_ids = map(lambda x : str(x['id']), resp['items'] if resp['totalResults'] > 0 else [])
      log("%s => [%s]" % (test, ','.join(bug_ids)))
    except googleapiclient.errors.HttpError:
      log("%s ERRORED" % test)

Jan 30 '21 01:01 stephenmcgruer

And the results: processed-tests.txt

(Note: did a pass through processed-tests.txt and removed 626703 as that is a known meta-bug of zero value)

Next step, turn this into a series of wpt-metadata PRs (using the same golang script as before), and let @foolip sort through which are junk and which are useful ;)

Jan 30 '21 01:01 stephenmcgruer

Golang script for updating wpt-metadata: https://gist.github.com/stephenmcgruer/0b84c426f2840003c542bcb25740a9d8

Jan 30 '21 01:01 stephenmcgruer

@foolip - I've sent you a few PRs now; consider them a sample of the output of this methodology. I'd like those reviewed first to see if the rest of the data is worth uploading or not (the PRs so far cover ~30% of the ~800 total tests that fail in wpt.fyi and have exactly one bug when you search their test path in monorail).

Jan 30 '21 15:01 stephenmcgruer

https://github.com/web-platform-tests/wpt-metadata/pull/804#issuecomment-770963731 has some bits useful for reviewing these PRs:

bug-titles.txt (faux-CSV file; split on the first ',' only).

Also as a spreadsheet: https://docs.google.com/spreadsheets/d/1AJFl3gLfVFjOXRAir9g2BsesdF3fGp9LJHnDtWQLW-c/edit#gid=0

Feb 02 '21 09:02 foolip

wpt-metadata wpt-metadata copied to clipboard

Programmatic mapping of Chrome failures to existing monorail bugs

wpt-metadata
wpt-metadata copied to clipboard