probe scraper unable to download file for tree: integration/mozilla-inbound
revision in tree integration/mozilla-inbound isn't available outside of probe-scraper's cache:
Retreiving Buildhub results for channel nightly
4645 revisions found
...
Downloading files for revision number 494/4645 - revision: 46fe2115d46a5bb40523b8466341d8f9a26e1bdf, tree: integration/mozilla-inbound, version: 49.0a1
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/app/probe_scraper/runner.py", line 833, in <module>
main(
File "/app/probe_scraper/runner.py", line 647, in main
upload_paths += load_moz_central_probes(
File "/app/probe_scraper/runner.py", line 323, in load_moz_central_probes
revision_data = moz_central_scraper.scrape_channel_revisions(
File "/app/probe_scraper/scrapers/moz_central_scraper.py", line 207, in scrape_channel_revisions
files = download_files(
File "/app/probe_scraper/scrapers/moz_central_scraper.py", line 123, in download_files
raise Exception(
Exception: Request returned status 404 for https://hg.mozilla.org/releases/integration/mozilla-inbound/raw-file/46fe2115d46a5bb40523b8466341d8f9a26e1bdf/toolkit/components/telemetry/Histograms.json
This is locally reproducible for me by running:
python3 -m probe_scraper.runner --out-dir=temp/probe_data --cache-dir temp/probe_cache --moz-central --firefox-version=49 --firefox-channel=nightly
and is fixed by manually downloading s3://telemetry-airflow-cache/cache/probe-scraper/hg/46fe2115d46a5bb40523b8466341d8f9a26e1bdf/toolkit/components/telemetry/Histograms.json into my local cache.
I modified probe_scraper/scrapers/moz_central_scraper.py to try and find all missing revisions, and this appears to be the only one.
my changes:
diff --git a/probe_scraper/scrapers/moz_central_scraper.py b/probe_scraper/scrapers/moz_central_scraper.py
index 61dea29..4c5ed1f 100644
--- a/probe_scraper/scrapers/moz_central_scraper.py
+++ b/probe_scraper/scrapers/moz_central_scraper.py
@@ -194,25 +194,34 @@ def scrape_channel_revisions(
print(" " + str(num_revisions) + " revisions found")
+ trees = set()
for i, rd in enumerate(revision_dates):
- revision = rd["revision"]
+ if rd["tree"] not in trees:
+ if rd["tree"] != "integration/mozilla-inbound":
+ trees.add(rd["tree"])
- print(
- (
- f" Downloading files for revision number {str(i+1)}/{str(num_revisions)}"
- f" - revision: {revision}, tree: {rd['tree']}, version: {str(rd['version'])}"
+ revision = rd["revision"]
+
+ print(
+ (
+ f" Downloading files for revision number {str(i+1)}/{str(num_revisions)}"
+ f" - revision: {revision}, tree: {rd['tree']}, version: {str(rd['version'])}"
+ )
)
- )
- version = extract_major_version(rd["version"])
- files = download_files(
- channel, revision, folder, error_cache, version, tree=rd["tree"]
- )
-
- results[channel][revision] = {
- "date": rd["date"],
- "version": version,
- "registries": files,
- }
- save_error_cache(folder, error_cache)
+ version = extract_major_version(rd["version"])
+ try:
+ files = download_files(
+ channel, revision, folder, error_cache, version, tree=rd["tree"]
+ )
+
+ results[channel][revision] = {
+ "date": rd["date"],
+ "version": version,
+ "registries": files,
+ }
+ except Exception:
+ import traceback
+ traceback.print_exc()
+ save_error_cache(folder, error_cache)
return results
for now I've asked Data SRE to copy the missing cache file to the new cache location, https://mozilla-hub.atlassian.net/browse/DSRE-1001?focusedCommentId=590672, but idk if there's a long-term solution needed here.
cc @chutten
...why are we pulling mozilla-inbound? Surely we only care about mozilla-central? Branches on /integration/ don't ship binaries we'd expect to receive data from, so we shouldn't need to care much about what is or isn't present on them.
we're pulling from that tree because it's listed by buildhub. we don't (currently) filter what buildhub returns for firefox versions when scraping legacy telemetry in prod. specifically for firefox nightly 49.0a1, buildhub returns a list that includes revision: 46fe2115d46a5bb40523b8466341d8f9a26e1bdf, tree: integration/mozilla-inbound