Print some stats before saving probe data to disk
To make it more easy to judge whether things worked correctly etc., it would be great to print some basic stats before saving the outputs to disk in runner.py. E.g.:
- how many probes of each type we have (histograms, scalars, events, ...)
- how many versions/revisions we extracted data for
I want to Work on this. You can guide me. This is my first bug.
Hey @sannanansari , hi and welcome!
One possible approach for doing this is the following:
- Extract the number of revisions/version for each channel by counting the entries in each key of
revisions, here. - Change the
transformmethod so that it counts the number of probes per type, per channel and returns a structure with the data. - Print all the gathered statistics.
Does this make sense?
Sorry for late. You can elaborate about 1st point.
revisions is a dictionary.
We want to count the number of entries (or keys) in it (so we can later print it).
I am getting some error.
ERROR:--
Unable to parse whitelist (C:\Users\Sannan Ansari\Documents\GitHub\probe-scraper\probe_scraper\parsers\third_party\histogram-whitelists.json). Assuming all histograms are acceptable.
Traceback (most recent call last):
File "runner.py", line 19, in
All git commands will error until this is rectified.
This initial warning can be silenced or aggravated in the future by setting the $GIT_PYTHON_REFRESH environment variable. Use one of the following values: - quiet|q|silence|s|none|n|0: for no warning or exception - warn|w|warning|1: for a printed warning - error|e|raise|r|2: for a raised exception
Example: export GIT_PYTHON_REFRESH=quiet
The first part ("unable to parse whitelist") is just a warning and expected.
The second part about git is the important one. It sounds like you either 1) don't have git installed or 2) it's not in the path like pointed out in the message.
You could:
- just run this with:
python probe_scraper/runner.py --only-moz-central-probes --dry-run - or... install git and make sure it's exposed through your
$PATHetc.
Their is no file name(histogram-whitelists.json). So, how can it take any file which is not their as input. What is a whitelist?
You can ignore that first part about "histogram-whitelists.json", it is expected and the code will run fine without this file. A whitelist is a general concept. Here it is used to allow some backward compability for only specific entries, but only when this is run when building Firefox. As mentioned, for this issue you don't need to worry about the whitelist.
I made a pull request You can review it.
https://github.com/sannanansari/probe-scraper/commit/dd26b344eef523c8a2cbfad5c15b4d132b917495
Ok, great. Can you do a pull request against this mozilla repository? You can read about the steps here.