Zipcodes icon indicating copy to clipboard operation
Zipcodes copied to clipboard

Update the zipcode database monthly

Open seanpianka opened this issue 4 years ago • 7 comments

Currently, the zipcode database can be out-of-sync because no one has made manual updates to the zipcodes.json data-file (which contains the zipcode data available in this package).

Goal: When https://www.unitedstateszipcodes.org releases an updated zipcode dataset, create a new release of this packages with the updated dataset.

Solution: Create a cronjob to perform the following steps monthly.

$ git clone https://github.com/seanpianka/Zipcodes
$ cd Zipcodes
$ python ci/__init__.py
$ bzip2 zips.json
$ mv zips.json.bz2 zipcodes/
$ bash scripts/get-next-patch-version "${current_version}"
$ bash scripts/create-new-python-wheel-release
$ bash scripts/add-to-git-and-publish-to-pypi

seanpianka avatar Mar 28 '20 01:03 seanpianka

@seanpianka - are you looking for a volunteer to create a autoamted job to run the steps described above and push a PR to the repo each month with a new/updated zip code database. If you are - i could instrument this most likely and deliver this. As I rean into an issue just now that the DBMS is out of date - a zipcode is failing that I assume would pass if this library/tool was current. Let me know

Ken

kenvenner avatar Jan 20 '22 21:01 kenvenner

Yes, I'm certainly open to pull requests that can automate this! As you know, it's important that it's updated regularly, but I don't have time to do so manually. A GitHub Actions pipeline that does this would be a great help!

seanpianka avatar Jan 20 '22 23:01 seanpianka

Great - i assume you are pulling the source data from USPS as an individual - the free version? I will plan on doing the same

kenvenner avatar Jan 21 '22 20:01 kenvenner

the ci folder does not appear to be checked in to the repo? python ci/init.py

kenvenner avatar Jan 21 '22 20:01 kenvenner

Yes, that's the db I've used the last few times. Additionally, the script for building the dataset merges in GPS data (lat/lon) from a separate dataset focused on GPS accuracy.

This script can be found in scripts/, I think I removed the ci/ folder in a recent commit.

seanpianka avatar Jan 21 '22 23:01 seanpianka

There are two data sources in your scripts:

https://www.unitedstateszipcodes.org/zip-code-database/ is obtained from https://www.unitedstateszipcodes.org/zip-code-database/# and is loaded in base_zipcodes_filename = "scripts/data/zip_code_database.csv"

not sure what the data source is for this file: gps_zipcodes_filename = "scripts/data/zip-codes-database-FREE.csv"

Can you provide me where this file comes from?

kenvenner avatar Jan 25 '22 15:01 kenvenner

I am honestly not sure where I downloaded this from, and I neglected to document this anywhere it seems.

The goal here is to have an alternate zipcode dataset that we can use to update/override the lat/lon values in the unitedstateszipcodes.org dataset. The following sources should be suitable enough for this purpose:

https://www.uszipcodeslist.com/ https://simplemaps.com/data/us-zips

In the script to generate the final dataset, it makes a best-effort attempt to update the existing zipcodes with available lat/lon data from the other dataset. If one dataset does not include a zipcode present in the other dataset, it is fine to simply skip that value and leave the lat/lon data as-is.

seanpianka avatar Jan 26 '22 00:01 seanpianka