nyc-stabilization-unit-counts icon indicating copy to clipboard operation
nyc-stabilization-unit-counts copied to clipboard

Added a special parser for unpaid charges

Open fedex1 opened this issue 8 years ago • 2 comments

It appears that the current parse.py is very much intertwined with special cases. It may be better to have individual scripts to parse out particular parts of the data. For example unpaid charges could be it's own script. So I created one.

Also I'd like to run this against all the data. It appears taxbills.nyc already has all the data. How big is the entire data directory?

fedex1 avatar Jun 02 '16 03:06 fedex1

Thanks for opening the PR!

I noticed that parse_unpaid.py appears to be a copy of parse.py with modifications. Instead of copying the file and modifying it, could you please apply the modifications to the original? Otherwise it is very difficult for me to see what you added.

I understand what you're saying about having separate functionality in separate scripts. In that case, you should confine all your new functionality to a separate file and import the requirements from parse.py, instead of copying everything over. If I accepted this PR as is, I would be adding ~500 lines of duplicate code.

The data directory is several hundred GB. I can't remember off the top of my head.

talos avatar Jun 02 '16 11:06 talos

Yes will make parse and parse_unpaid more modular.

The reason I did it this way is there a lot of special cases such as:


        if i == 0:
                continue

that do not apply in all cases.

For the data directory could we zip up only the .TXT files. I would volunteer to do that if you give me read access to the files on the machine. It would take a long time doing it over the internet (I believe)

fedex1 avatar Jun 02 '16 11:06 fedex1