Jim Counts

Results 17 comments of Jim Counts

As long as we can opt out of normalization in the files written to disk. I've had scenarios where downstream systems would break if the line endings were not exactly...

So what do you have in mind for the reporting? Most reporters expect files to exist on the disk. Normalizing the received file is easy enough since it is temporary....

My thoughts? I usually recall a coworker who one said something like "Regular expressions can't parse every address, because addresses aren't a regular language." Nevertheless, my goal in keeping this...

Think the regex assumes cities don't have numbers in them. (Of course, some do in Puerto Rico, again--not regular).

Pretty sure the solution we (I) came up with for Brooklyn was to match "Avenue \w" pattern before trying to parse the address, then using a customized version of the...

We can certainly add some tests at a more fine-grained unit level, but the integrated performance matters more. The addresses come in as a blob, so it's the parsing of...

Sorry, I didn't mean computational performance. I meant "ability to correctly extract addresses" as the measure of performance. So we are on the same page. I will scaffold out the...

I added some todos to the OP, and I set up the first 2 items, just need to see why it is failing on AppVeyor. Gathering the 10k sample address...

I did spend a minute looking at this last night and it appears to only contain "City/State/Zip". Is that correct or did I just not look deeply enough?

As mentioned previously in this thread: https://results.openaddresses.io/ Don't worry about it though. I've already downloaded the US datasets from that site and I'm looking at the data format now.