clean-and-green-philly icon indicating copy to clipboard operation
clean-and-green-philly copied to clipboard

Bug: dtype changes on opa_id after running `council_dists` in new ETL pipeline

Open adamzev opened this issue 10 months ago • 2 comments

After running council_dists in the new pipeline, opa_id values go from being str to int and leading zeros are removed.

I can't tell if the type conversion causes any data issues currently but it makes it less clear what type to document opa_id as and may lead to unexpected behavior down the line.

Expected behavior Consistent types (preferably always a str since it is an id) for opa_id

adamzev avatar Mar 20 '25 14:03 adamzev

@adamzev Yeah, opa_id should always be a string, without losing those leading 0s. So this should be investigated.

nlebovits avatar Mar 20 '25 14:03 nlebovits

It looks like there are places in both spatial_join here and opa_join here for feature layers where opa_id is being coerced into an integer.

These are both called in a number of services across the pipeline, so maybe was there a reason for this being in place to begin with (some inconsistency across the IDs in the different datasets)? Otherwise it seems we can just take it out and look to include some of those data qc checks for opa_di always remaining a string in the future.

cfreedman avatar Apr 10 '25 15:04 cfreedman