geocoder
geocoder copied to clipboard
Multiple Lookup Strategy
We need a way to effectively use multiple geocoding APIs. Situations that need to be handled:
- The primary API is unavailable and we'd like to automatically fall back to another.
- The primary API is returning bad/unexpected results (or quota is reached) and we'd like to automatically fall back to another.
- We'd like to use different APIs for different queries, depending on whether the query is an IP address or a physical address, but also if the query is for a certain part of the world where one API is significantly better than another (eg: Yandex in Russia).
Ideas for implementation:
- Set :lookup to an array of preferred APIs (descending preference).
- Set :lookup to a block which takes a query and returns a lookup name, or an array of names (as in #1).
Need to determine:
- How to "fall back" to a less-preferable API. On timeout? On any API-related exception? On any Geocoder exception? On a particular API response? On a
nil
result? And how configurable does it need to be? - How to handle IP vs. physical address lookups in the config. Currently they are configured separately (:lookup and :ip_lookup). Is there an advantage to combining them?
We'd like to use different APIs for different queries, depending on whether the query is an IP address or a physical address, but also if the query is for a certain part of the world where one API is significantly better than another (eg: Yandex in Russia).
Using a different service for Ip and Street request is easy to do. But selecting the best service to use for a certain part of the world means to determine the country/state of a query. This means that one have to find :
- 'US' for 'New York city' query
but also :
- 'RU' for 'У́лица Арба́т Москва'
- 'RU' for 'San Petersburgo'
Yes. The Geocoder gem will NOT itself determine what API should be used for a query, but should allow a developer to write code that examines the query and decides which API is best.
It would be ideal if we could simply supply the lookup as part of the Geocode.search method's options. If the first lookup method fails, I'd personally prefer to handle it on my own programmatically as I have some narrow requirements I need to match against.
Would this be accepted as a PR? It'd literally be a one line change and a test.
@sgonyea (and anyone else who's interested in this): could you describe the logic you will use to fall back to different lookups? (1: what defines failure? and 2: how to choose the next lookup) If I can get a sense for what everyone's needs are I may be able to add functionality to Geocoder that reduces the amount of code you need to write and which, presumably, would be useful to others.
How about providing these functionalities through lookup strategy classes?
I.e. something like this:
# Failover to maxmind if google times out
failover = new FailoverLookupStrategy([:google, :maxmind], :failover_on_timeout => true, :some_other => :baz)
# use google for US, yandex for RU, else maxmind
country_based = new CountryBasedLookupStrategy('US' => :google, 'RU' => :yandex, maxmind)
# Use Google for IP, maxmind for Addresses
type_based = new TypeBasedLookupStrategy(:address => :google, :ip => :maxmind)
Then one could also combine various lookup strategies.
combined = new TypeBasedLookupStrategy(:address => country_based, :ip => failover)
The code above is just a rough idea, there might be smarter ways. The symbols :google, :yandex and so on would probably be actual lookups. Might be complicated to configure in declarative way - but looks very flexible to me. Am i missing something?
@pascalbetz I agree that that approach looks very flexible yet maybe complicated to configure. It's a really good suggestion. Thanks!
@alexreisner Thanks.
Some more Lookup Strategies could include "round robin" (to use multiple free plans) and crazy stuff where you select a lookup by time. If geocoder provides for a interface and a configuration mechanism then one could easily implement his own strategies.
If one can set the strategy manually in code in a initializer, then configuration could even be pushed back a little while.
I would like to give it a try but am burried in work that i have to do first. I'll check back when i have time.
Regardless of what strategies get internalized, I still think a very simple way should exist for specifying which Geocoder backend to use. This is a hard problem to solve in precisely the way people want it.
I personally am geocoding against both Google and Bing. If the coordinates they supply are too far away from each other, I don't use any of the results (Haversine). Etc.
@alexreisner Oh just realized you asked me a question and I missed it. I have a table that stores the lookups I've performed for each address. Addresses can have some funky forms. But basically in that table I store the response from Google and Bing. I then do some checks on the results (ie, how accurate they claim to be. "Do they have a zip?" is an importan question.). I also make sure that the results appear within the boundaries of the City (or district) that I'm expecting it to be in (via BorderPatrol).
I also compare the distance between the results they both provide, as "123 Foo St." can be turned int "N Foo St." by one and "S. Foo St." by the other. If they disagree, I discard both results and leave the address as un-geocoded. It then gets revisited on a frequent basis (in a batch job). Eventually Google and Bing will sort out their disagreement.
Anyway, so that's how I use it. I want to call both API services at all times; I do not want to call one (or the other) conditionally... etc. I may then throw out both results.
@pascalbetz's Strategy implementation sounds like it would be the most flexible for the most people but maybe more complicated to implement.
I am mostly doing reverse lookup of street addresses and my needs would be met by either being able to configure a failover lookup service to use when the primary is down or over limit or something like that, or by being able to pass the lookup service to Geocoder.search and then I could handle failures in my own code.
Google seems to be the only one providing COUNTY and I run into problems when over the limit is encountered. However, failover strategy can help at least for coordinates.
This has been working well for me.
https://gist.github.com/phallstrom/85670d895b3629e481a7
It does mean having to do the geocoding yourself (vs automatic hooks in AR, etc.) but it seems to work okay.
@phallstrom thanks for sharing this code.
Coming from the python world, it seems like the problem here is the usage of static methods. For comparison, we can refer to sample code at http://code.google.com/p/geopy/wiki/GettingStarted, where users can quite easily implement their own fallback strategies.
I blame the Rails culture :stuck_out_tongue:
@phallstrom I don't think that would be thread-safe.
Hi @alexreisner, any news about this? Also, I saw you closed #178 but couldn't find any example about multiple geocoders configuration. Could you point me to some documentation or sample?
@sayap i think this can already be achieved by instantiating the lookups yourself. The static references ar just needed for the "hey it works out of the box" stuff like geocode extension to AR and so on.
@LucaRonin Sorry, the README has been lagging a bit. I just added some documentation: ebc7f1510. Does that help?
Thank you @alexreisner, that's great!
Hello @alexreisner I'm loving this gem and, referencing your third bullet point, I need to use different APIs for different queries. In particular, I'd like to use LocationIQ for the front-end because it's cheap but inaccurate, and Google for the admin back-end because it's expensive but very accurate. In other words:
One cheap service for lots of queries that doesn't need too much precision and one expensive service for a few queries that require great precision.
At the current state of the gem, how can one achieve this please?
It would be ideal being able to specify which strategy/api to use for each lookup, defaulting to the first configured service.
Update: never mind, just found out that we can call the lookup strategy directly, for instance: Geocoder.search('address...', lookup: :google)
Here is how I implemented an easy fallback strategy:
geocoded_by :full_address, lookup: lambda{ |obj| obj.geocoder_lookup }
def geocoder_lookup
if Geocoder.search(full_address, lookup: :nominatim).present?
:nominatim
else
:falllback_geocoder (:google, :esri, etc..)
end
end
The geocoder_lookup will try for your first and then second option. If it get anything the results are cached by the Geocoder on Redis. This makes it cheap for a second lookup when the geocode
is called again. It make at least 2 Redis requests per call so it's not perfect though