rosie
rosie copied to clipboard
Add command line option to limit dataset years
Depends on datasciencebr/serenata-toolbox#97, missing tests once the other pull request gets merged.
LGTM… waiting for https://github.com/datasciencebr/serenata-toolbox/pull/97 then.
Hello everyone, I'm getting back to this PR :)
I tested the command with years
, seems to be working, but I'll take a closer look:
(serenata_rosie) ➜ rosie git:(irio-limit-years) ✗ python rosie.py run chamber_of_deputies --years 2017 /tmp/
2017-12-08 13:58:34,684 - root - INFO - Merging all datasets…
2017-12-08 13:58:34,684 - root - INFO - Loading reimbursements-2017.xz…
2017-12-08 13:58:37,153 - root - INFO - Dropping rows without document_value or reimbursement_number…
2017-12-08 13:58:37,845 - root - INFO - Grouping dataset by applicant_id, document_id and year…
2017-12-08 13:58:37,846 - root - INFO - Gathering all reimbursement numbers together…
2017-12-08 13:58:40,804 - root - INFO - Summing all net values together…
2017-12-08 13:58:40,826 - root - INFO - Summing all reimbursement values together…
2017-12-08 13:58:40,852 - root - INFO - Generating the new dataset…
2017-12-08 13:58:41,999 - root - INFO - Casting changes to a new DataFrame…
2017-12-08 13:58:41,999 - root - INFO - Writing it to file…
2017-12-08 13:59:00,764 - root - INFO - Done.
Downloading 2016-09-03-companies.xz: 100%|██████████████████████████████████████████████████████████████| 4.84M/4.84M [00:03<00:00, 1.30Mb/s]