nyc-stabilization-unit-counts
nyc-stabilization-unit-counts copied to clipboard
Added a special parser for unpaid charges
It appears that the current parse.py is very much intertwined with special cases. It may be better to have individual scripts to parse out particular parts of the data. For example unpaid charges could be it's own script. So I created one.
Also I'd like to run this against all the data. It appears taxbills.nyc already has all the data. How big is the entire data directory?
Thanks for opening the PR!
I noticed that parse_unpaid.py
appears to be a copy of parse.py
with modifications. Instead of copying the file and modifying it, could you please apply the modifications to the original? Otherwise it is very difficult for me to see what you added.
I understand what you're saying about having separate functionality in separate scripts. In that case, you should confine all your new functionality to a separate file and import the requirements from parse.py
, instead of copying everything over. If I accepted this PR as is, I would be adding ~500 lines of duplicate code.
The data directory is several hundred GB. I can't remember off the top of my head.
Yes will make parse and parse_unpaid more modular.
The reason I did it this way is there a lot of special cases such as:
if i == 0:
continue
that do not apply in all cases.
For the data directory could we zip up only the .TXT files. I would volunteer to do that if you give me read access to the files on the machine. It would take a long time doing it over the internet (I believe)