probe-scraper icon indicating copy to clipboard operation
probe-scraper copied to clipboard

Filter out missing revisions

Open fbertsch opened this issue 5 years ago • 9 comments
trafficstars

Somewhere along the line we are retrieving revisions for hashes which don't exist on hg.mozilla.org. How or where this is happening should be investigated, and those revisions stripped out.

fbertsch avatar Apr 06 '20 21:04 fbertsch

Here is the list of invalid URLs being fetched by probe-scraper: https://gist.github.com/fbertsch/f0d27f697dec888e1e7ed88a048b2ad3

fbertsch avatar Apr 09 '20 14:04 fbertsch

cc @mdboom you mentioned your team was interested in working on bugs here, is this something you all would have the bandwidth to take on?

fbertsch avatar Apr 09 '20 14:04 fbertsch

Sure. I made a bugzilla issue to point to this one (which will make it easier for my team to not lose it): https://bugzilla.mozilla.org/show_bug.cgi?id=1628725

mdboom avatar Apr 09 '20 14:04 mdboom

Having trouble reproducing this. Does the probe-scraper deploy cache the repository? Maybe it's getting stale/broken?

mdboom avatar Apr 27 '20 15:04 mdboom

Indeed it does. The cache is here: https://github.com/mozilla/probe-scraper/blob/master/probe_scraper/runner.py#L300

fbertsch avatar Apr 27 '20 15:04 fbertsch

So maybe forcing a clean checkout of m-c in the cache would fix the problem? (Of course, that won't help us understand how we got to this bug in the first place...)

mdboom avatar Apr 27 '20 15:04 mdboom

Agreed. Running P-S in a fresh cache location would be a good start (IIRC this should take 5-6 hours), and will hit M-C a bunch. Are you up for that, Mike?

fbertsch avatar Apr 27 '20 15:04 fbertsch

Well, I did that locally already last week (well, in fairness I ran it for the first time on a new machine) and wasn't able to reproduce the bug. Where do I start with trying to do that in deployment?

mdboom avatar Apr 27 '20 15:04 mdboom

Ah, gotcha. You'd need AWS creds to run this with a separate bucket. Sounds like this investigation needs to happen on more the ops side.

fbertsch avatar Apr 27 '20 19:04 fbertsch