Frank Bertsch
Frank Bertsch
This will also require reading from new streaming data sources.
@Dexterp37 I do! https://github.com/mozilla/probe-scraper/pull/214
Here is the list of invalid URLs being fetched by probe-scraper: https://gist.github.com/fbertsch/f0d27f697dec888e1e7ed88a048b2ad3
cc @mdboom you mentioned your team was interested in working on bugs here, is this something you all would have the bandwidth to take on?
Indeed it does. The cache is here: https://github.com/mozilla/probe-scraper/blob/master/probe_scraper/runner.py#L300
Agreed. Running P-S in a fresh cache location would be a good start (IIRC this should take 5-6 hours), and will hit M-C a _bunch_. Are you up for that,...
Ah, gotcha. You'd need AWS creds to run this with a separate bucket. Sounds like this investigation needs to happen on more the ops side.
> Do I need to provide a repo if there isn't a metrics.yaml? What if it's a mercurial repo instead of Git? Does probe-scraper know about pings.yaml files? We can...
Currently we ignore historical `metrics.yaml` files that are not compatible with the current version of `glean_parser`.
@mdboom that does make sense. What I would prefer is for the probe-scraper not to have to deal with any of that, and instead have the `metrics.yaml` contain an optional...