dstk icon indicating copy to clipboard operation
dstk copied to clipboard

Google style geocoder returning inconsistent results to same query

Open estiens opened this issue 11 years ago • 7 comments

When querying cities in Canada, the google style geocoder is occasionally returning results in Europe. This seems to happen randomly. For example querying through the web interface using

"100 Duncan St Toronto ON Canada" I could press refresh and toggle back and forth between the following results, seemingly randomly

[{"address_components":[{"short_name":"20","types":["administrative_area_level_1","political"],"long_name":"20"},{"short_name":"tr","types":["country","political"],"long_name":"Turkey"}],"types":["administrative_area_level_1","political"],"geometry":{"location_type":"APPROXIMATE","location":{"lat":38.9167,"lng":40.3},"viewport":{"southwest":{"lat":37.9167,"lng":39.3},"northeast":{"lat":39.9167,"lng":41.3}}}}]

[{"geometry":{"location_type":"APPROXIMATE","location":{"lng":-79.4163,"lat":43.70011},"viewport":{"southwest":{"lng":-79.6427230835,"lat":43.5466194153},"northeast":{"lng":-79.2320251465,"lat":43.8083610535}}},"types":["locality","political"],"address_components":[{"short_name":"Toronto","long_name":"Toronto, ON, CA","types":["locality","political"]},{"short_name":"CA","long_name":"Canada","types":["country","political"]}]}]

estiens avatar Oct 08 '13 17:10 estiens

This only appears to happen with cities in Canada, but it also occurs when geocoding some addresses in Vancouver

estiens avatar Oct 08 '13 17:10 estiens

Thanks Eric. The geocoder isn't extended to street-level addresses in Canada yet, but I would expect it to pick up Toronto in that address, and the Turkey result is clearly wrong. To help me reproduce it, is this the right URL for the API call? http://www.datasciencetoolkit.org/maps/api/geocode/json?address=100%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1506013870353344828_1381279072437&_=1381279083648

If you're using the web interface in Chrome, you'll see this if you open View->Developer->Developer Tools and then select the Network tab before sending a query. I appreciate your help tracking this down!

petewarden avatar Oct 09 '13 00:10 petewarden

http://www.datasciencetoolkit.org/maps/api/geocode/json?address=100%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1508765704047400504_1381282583344&_=1381282780552 resulted in a correct parsing. Trying to reproduce incorrect parsing now. The only think I can offer so far is that it is happening intermittently when parsing any address we have in Canada that hits the API

estiens avatar Oct 09 '13 01:10 estiens

This resulted in a Turkish location just now

http://www.datasciencetoolkit.org/maps/api/geocode/json?address=20%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1508765704047400504_1381282583345&_=1381282897790

estiens avatar Oct 09 '13 01:10 estiens

And then correctly located it in Canada (same query)

http://www.datasciencetoolkit.org/maps/api/geocode/json?address=20%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1508765704047400504_1381282583349&_=1381282954221

estiens avatar Oct 09 '13 01:10 estiens

Just wondering if any workarounds exist for this? Still getting very inconsistent coding for all of our Canada locations. (Ending up in different countries mostly) for queries that match "Street" "City" "Province" "Canada"

estiens avatar Oct 17 '13 21:10 estiens

I've had a chance to dig into this, and here's what appears to be happening:

  • The main /api/geocode/json endpoint checks with the TwoFishes geocoder, and then falls back to other methods.
  • The TwoFishes server process has been failing under heavy load, and then the daemon that should be able to restart it has failed due to lack of free memory.
  • There are multiple DSTK servers behind a load balancer servicing the main www.datasciencetoolkit.org endpoint. When one of the back end servers has a dead TwoFishes server process, but the others are still running, you'll see inconsistent results, depending on which server the load balancer directs your request to.

I don't have a fix for the underlying problem of the TwoFishes server process failing, but I have added a new Pingdom alert for the TwoFishes endpoint. I've restarted TF on all the servers, and now I should be able to catch problems soon after they happen, and hopefully get a clearer idea of what's going on.

I suspect it might be the separate processes fighting over available memory, in which case I might need to look into something like Linux Control Groups to ensure there's enough memory reserved to restart the TwoFishes process if it does ever fail.

petewarden avatar Oct 24 '13 19:10 petewarden