list icon indicating copy to clipboard operation
list copied to clipboard

Espirito Santo geocoding issue

Open AnyaLindstromBattle opened this issue 4 years ago • 0 comments

Chris and I noticed that there are quite a few cases ingested by the Espirito Santo parser which are geocoded to other places in Brazil (approx 2k at last look). I've looked a bit more at the cases which aren't geocoded to Espirito Santo and it seems like most of them are actual locations in other Brazilian states. e.g. 'Campos dos Goytacazes' is an actual location in Rio de Janeiro, and a location of that name does not exist in Espirito Santo. The issue is that, unlike most Brazilian datasets, the Espirito Santo one doesn't have a 'state' field and so we cannot exclude cases during parsing which, for one reason or another, are recorded even though they are from another state. I don't really see a parser-related solution to this issue, to me it seems as though this is something which will be solved by the data sense-checking step we've discussed that is needed between parsing and uploading 'final' data into the database (i.e. if we have a step after geocoding which checks location against expected state, for example). Or does anyone have any other ideas?

AnyaLindstromBattle avatar Jan 25 '21 12:01 AnyaLindstromBattle