pudl icon indicating copy to clipboard operation
pudl copied to clipboard

Incorrect county FIPS code for Bedford, VA

Open gschivley opened this issue 10 months ago • 10 comments

Describe the bug

The addfips package is labeling Bedford, VA as '51515', which is the code for Bedford City. It should actually be '51019' (Bedford County). See their list of FIPS codes.

Bug Severity

How badly is this bug affecting you? Medium: I was able to identify and fix the bug in my own workflow but it might affect other people.

To Reproduce

I found the error in the core_eia861__yearly_service_territory table. Census population files do not have the FIPS code 51515.

gschivley avatar Apr 02 '24 19:04 gschivley

https://github.com/fitnr/addfips/issues/8 It looks like this particular issue was flagged by @TrentonBush almost a year ago, and addressed in the underlying package but not released. So perhaps we just need to bug them to cut a new release.

e-belfer avatar Apr 03 '24 13:04 e-belfer

Update just got pushed! Should be a simple matter of updating dependencies, I'll throw this issue into this sprint.

e-belfer avatar Jun 19 '24 18:06 e-belfer

As far as I can tell we're still waiting on the maintainer to merge their fix commit which apparently didn't make it into the release. I'll bump them again.

jdangerx avatar Sep 10 '24 14:09 jdangerx

I guess we could also pin to their fix/8 branch but we'll see if they respond to my nag, first.

jdangerx avatar Sep 10 '24 14:09 jdangerx

addfips exists to do one job and it fails to do it. Considering the whole package is like 300 lines and the maintainer doesn't maintain it, I think we should replace it. One option is to simply vendor it, another would be to replace it with something like Google's geocoder, which is much more powerful. I have used Google geocoder in a client project for years with good results.

TrentonBush avatar Sep 16 '24 16:09 TrentonBush

By Google's geocoder, you mean https://geocoder.readthedocs.io/index.html? Just poking around it seems like you'd need a TAMU key to pull FIPS codes out of county names. But it also seems like there's some federal APIs we could hit to get the FIPS codes?

jdangerx avatar Sep 17 '24 16:09 jdangerx

I meant Google Maps Platform's Geocoding API. IMO the primary advantages are that:

  1. they already implemented fuzzy matching (good for manually entered data with misspellings)
  2. it can handle any granularity from street address or lat/lon up to country name.

The disadvantages I am aware of are:

  • I don't think you can select a historical map to reference
  • it will update the reference maps on its schedule, not yours
  • if you're running it on every automated build, you'll need to make a caching layer or suffer network latency and per-call costs.

I use a cache layer and my usage always fits in the (generous) free tier. Occasionally cache invalidation issues cause minor annoyance, but it is easy to fix with a refresh.

TrentonBush avatar Sep 18 '24 02:09 TrentonBush

Ah sweet! What do you do for a caching layer?

I also just spent a few minutes poking around at the documentation and couldn't see where FIPS code would get returned - unless that gets returned as the short_name of an administrative_level_2 address component. Has that been your experience?

jdangerx avatar Sep 18 '24 13:09 jdangerx

Ah ya I use this as a cleaning/standardization function to convert dirty inputs to the official county names. Then you can do a simple join against the official Census data to get FIPS codes. But you need both!

Also I now realize the work I was referencing is actually public, so I'll just link to it. Sorry in advance for the data scientist quality code 😇

The row-level memory cache saves duplicate API calls per session (eg looking up the same county 1000 times), and the dataframe-level disk cache saves duplicate calls between runs (when a source dataset is unchanged).

I didn't automate the cache invalidation, I just do it manually because updates are infrequent. But the free tier resets each month, so a monthly clear could make sense.

TrentonBush avatar Sep 19 '24 01:09 TrentonBush