Handle case where pre- and -post directional are same
Heya,
I noticed a street name in the San Diego file today "S 39TH ST S" which has the "South" directional added twice:
cat us_ca_san_diego-addresses-county.geojson \
| grep 'S 39TH ST S' \
| jq '.properties.street'
"S 39TH ST S"
It seems that the error is caused by the source data including both pre (addrpdir) and post (addrpostd) directional columns with the value 'S':
ogr2ogr -f CSV /vsistdout/ addrapn_datasd.dbf \
| xsv search -s 'objectid' '854155' \
| xsv table
objectid addrnmbr addrfrac addrpdir addrname addrpostd addrsfx addrunit addrzip add_type roadsegid apn asource plcmt_loc community parcelid usng
854155 1261 S 39TH S ST 92113 0 5512003800 M C SAN DIEGO 11648 11S MS 89683 17286
Would it be possible to add a check in machine which only adds one of these values to the street field when both are present?
🤔 Are these one-offs in the data set? Maybe we should ask the county to fix the data?
It's definitely uncommon in OA, at least I've never noticed it before. Within this one file happens a lot:
ogr2ogr -f CSV /vsistdout/ addrapn_datasd.dbf \
| awk -F, '{ if($4 && $4==$6) {print $0} }' \
| xsv count
3595
Looking at the source, it could also be that addrpdir isn't what we think it is?
The post field is named addrpostd, I would expect the pre to be called addrpred but it's called addrpdir 🤷♂️.
It might still be a good idea to add some logic in machine to catch this
I think whenever the pre and post directional are identical it should always be considered an error?
Only one directional string should be added to the street string in this case.
[edit] If I were to chose which one, I'd favour keeping the post since it's much easier for consumers of the data to detect post-directionals than pre-directionals.
FWIW there are other logical errors in the San Diego geojson file, also because the source file is messy.
One thing I noticed is that machine inserts a space when the field is empty, so in these cases where there is no addrsfx we see a double space.
cat us_ca_san_diego-addresses-county.geojson \
| jq -r '.properties.street' \
| grep -E '^[NSEW]\s.{1,3}\s\s[NSEW]$'
W E W
W E W
E AVE E
W E W
W E W
E AVE E
E AVE E
W E W
E AVE E