Consider alternative method to mapping ZCTAs to counties

Open anth-volk opened this issue 1 year ago • 0 comments

The issue

In our underlying data packages, we distribute population based on Zip Code Tabulation Areas (ZCTAs), which apply a rough geographical mapping to ZIP codes. ZIP Codes and ZCTAs are a postal distribution tool, not a legally defined zone within the standard US administrative breakdown of state/county/municipality, and at times cross county, and even state, lines.

In the data package, we take ZCTAs and map them to counties. If the ZCTA encompasses more than one county, we randomly assign the household to one, then assign the household to the ZCTA's state.

However, this creates an edge case error. ZCTAs are assigned to one state, but don't always occupy only one state. For example, the image below shows ZCTA 19973, Seaford, Delaware. This ZCTA contains two counties: Kent, Delaware, and Dorchester, Maryland. All households in this ZCTA are assigned a state value of Delaware. They're then randomly distributed into the county of Kent (which is fine) and the county of Dorchester. Since there's no Dorchester County, Delaware, this creates a bug.

Why this matters

The more county-level programming we introduce, the more this bug will become a problem on the microsim side. ACA and Medicaid rating areas are based upon county, meaning I'd imagine we'll generate microsim bugs for these edge cases, of which there are at least 10.

On individual household-level calculations, this bug is currently not an issue. However, if we introduce ZIP code inputs and then continue to map ZCTAs to counties like we do, it will lead to potentially confusing outputs, bugs in Medicaid calculations (due to how small rating areas can be), and for those in edge-case ZIPs, could either completely crash or create significant confusion.

Mar 14 '25 17:03 anth-volk