probe-scraper icon indicating copy to clipboard operation
probe-scraper copied to clipboard

Print some stats before saving probe data to disk

Open georgf opened this issue 9 years ago • 11 comments

To make it more easy to judge whether things worked correctly etc., it would be great to print some basic stats before saving the outputs to disk in runner.py. E.g.:

  • how many probes of each type we have (histograms, scalars, events, ...)
  • how many versions/revisions we extracted data for

georgf avatar Apr 11 '17 10:04 georgf

I want to Work on this. You can guide me. This is my first bug.

sannanansari avatar Jul 15 '18 05:07 sannanansari

Hey @sannanansari , hi and welcome!

One possible approach for doing this is the following:

  • Extract the number of revisions/version for each channel by counting the entries in each key of revisions, here.
  • Change the transform method so that it counts the number of probes per type, per channel and returns a structure with the data.
  • Print all the gathered statistics.

Does this make sense?

Dexterp37 avatar Jul 16 '18 12:07 Dexterp37

Sorry for late. You can elaborate about 1st point.

sannanansari avatar Jul 21 '18 16:07 sannanansari

revisions is a dictionary. We want to count the number of entries (or keys) in it (so we can later print it).

georgf avatar Jul 24 '18 08:07 georgf

I am getting some error. ERROR:-- Unable to parse whitelist (C:\Users\Sannan Ansari\Documents\GitHub\probe-scraper\probe_scraper\parsers\third_party\histogram-whitelists.json). Assuming all histograms are acceptable. Traceback (most recent call last): File "runner.py", line 19, in from scrapers import git_scraper, moz_central_scraper File "C:\Users\Sannan Ansari\Documents\GitHub\probe-scraper\probe_scraper\scrapers\git_scraper.py", line 6, in from git import Repo File "C:\Python27\lib\site-packages\git_init_.py", line 85, in raise ImportError('Failed to initialize: {0}'.format(exc)) ImportError: Failed to initialize: Bad git executable. The git executable must be specified in one of the following ways: - be included in your $PATH - be set via $GIT_PYTHON_GIT_EXECUTABLE - explicitly set via git.refresh()

All git commands will error until this is rectified.

This initial warning can be silenced or aggravated in the future by setting the $GIT_PYTHON_REFRESH environment variable. Use one of the following values: - quiet|q|silence|s|none|n|0: for no warning or exception - warn|w|warning|1: for a printed warning - error|e|raise|r|2: for a raised exception

Example: export GIT_PYTHON_REFRESH=quiet

sannanansari avatar Jul 26 '18 17:07 sannanansari

The first part ("unable to parse whitelist") is just a warning and expected.

The second part about git is the important one. It sounds like you either 1) don't have git installed or 2) it's not in the path like pointed out in the message.

You could:

  • just run this with: python probe_scraper/runner.py --only-moz-central-probes --dry-run
  • or... install git and make sure it's exposed through your $PATH etc.

georgf avatar Jul 27 '18 16:07 georgf

Their is no file name(histogram-whitelists.json). So, how can it take any file which is not their as input. What is a whitelist?

sannanansari avatar Jul 29 '18 07:07 sannanansari

You can ignore that first part about "histogram-whitelists.json", it is expected and the code will run fine without this file. A whitelist is a general concept. Here it is used to allow some backward compability for only specific entries, but only when this is run when building Firefox. As mentioned, for this issue you don't need to worry about the whitelist.

georgf avatar Jul 30 '18 13:07 georgf

I made a pull request You can review it.

sannanansari avatar Jul 30 '18 15:07 sannanansari

https://github.com/sannanansari/probe-scraper/commit/dd26b344eef523c8a2cbfad5c15b4d132b917495

sannanansari avatar Jul 30 '18 15:07 sannanansari

Ok, great. Can you do a pull request against this mozilla repository? You can read about the steps here.

georgf avatar Jul 31 '18 12:07 georgf