open-location-code icon indicating copy to clipboard operation
open-location-code copied to clipboard

Plans to open the PlusCodes short-coding blackbox?

Open ppKrauss opened this issue 1 year ago • 5 comments

PlusCodes has a "plus algorithm" for name-to-prefix conversion:

image

The the illustration exemplifies the short code "598P+Q36 Itagui, Antioquia, Colombia", that was translated by PlusCodes as 67R6598P+Q36, where 67R6 is the translated prefix (from "Itagüí, Antioquia, Colombia") and 598P+Q36 the original suffix.

The encode/decode "short codes" has a good OLC specification, but it is not enough do "emulate" PlusCodes: to be an "open algorithm" need some kind of lookup table that transforms names into OLC prefixes — and exactly in the same way that PlusCodes does.

I can't find the dataset or the algorithm that PlusCodes used... Since this is a Google project, do you have any plans to publish it?
Or maybe an algorithm description (high level abstraction) that can be adapted in another big database, like Nominatim.

ppKrauss avatar Sep 17 '22 22:09 ppKrauss

All of that is part of Google's Geocoding API. If your geocoding request is a plus code, then the response will have lat/lng coordinates as part of its geometry - and if your reverse geocoding request contains coordinates, the response will contain a plus code.

Using that API is not always free (as in beer), but it exists and can be used.

Regarding your second questions, I believe such dataset, while easy enough to describe, is not actually that trivial to generate or maintain. Basically, what we want is that for every point on the globe, there exists one or at most a few place names with one set of canonical coordinates each, so that getting those canonical coordinates from a place name and using them as second argument to the recoverNearest function results in the original coordinates that went into generating the plus code earlier.

This dataset in the best case not only needs to be complete in the sense that at least one name exists for every point on the globe, it also needs to be sensible in the sense that it should avoid returning place names across (city/state/national) borders even in a world where those borders change over time.

bocops avatar Sep 18 '22 10:09 bocops

Re algorithm description, Nominatim or another geocoder can be used with the following python example (modulo the issues bocops mentioned around returning place names across borders):

from openlocationcode import openlocationcode as olc


# returns coordinates of the give place name
def geocode(place_name):
  # TODO: Support other places
  if place_name == 'NYC':
    return (40.71275,-74.00816)


# returns a city name and coordinates near the given point
def reverse_geocode(latlng):
  # TODO: Support other places
  return ('NYC', (40.71275,-74.00816))


# converts a compound code, e.g. "P2C2+H8 NYC", into a global code, e.g.
# "87G8P2C2+H8", using a geocoder
def global_code_for_compound_code(compound_code):
  # parse into short code and reference location name
  short_code, reference_location_name = compound_code.split(' ', 1)

  # look up coordinates of reference location name (geocode)
  reference_lat, reference_lng = geocode(reference_location_name)

  # use library function to recover global code
  return olc.recoverNearest(short_code, reference_lat, reference_lng)


print(f"""global_code_for_compound_code('P2C2+H8 NYC)': {
    global_code_for_compound_code('P2C2+H8 NYC')}""")
print(f"""global_code_for_compound_code('PXCX+9J9 NYC)': {
    global_code_for_compound_code('PXCX+9J9 NYC')}""")
print()


# converts a global code, e.g. "87G8P2C2+H8", into a compound code, e.g.
# "P2C2+H8 NYC", using a reverse geocoder.
def compound_code_for_global_code(global_code):
  # parse global code using open source library, get its coordinates
  global_code_lat, global_code_lng = olc.decode(global_code).latlng()

  # look up nearby place name for global code's coordinates (reverse geocode)
  reference_location_name, (reference_lat, reference_lng) = reverse_geocode(
      (global_code_lat, global_code_lng))

  # use library function to shorten to local code
  local_code = olc.shorten(global_code, reference_lat, reference_lng)

  # optionally, if we've shortened too much, restore some of the prefix from the
  # global code
  chars_before_plus = local_code.index('+')
  if chars_before_plus < 4:
    local_code = global_code[4:8-chars_before_plus] + local_code

  # combine with reference location name
  return f'{local_code} {reference_location_name}'


print(f"""compound_code_for_global_code('87G8P2C2+H8)': {
    compound_code_for_global_code('87G8P2C2+H8')}""")
print(f"""compound_code_for_global_code('87G7PXCX+9J9)': {
    compound_code_for_global_code('87G7PXCX+9J9')}""")
print(f"""compound_code_for_global_code('87G7PX7R+3PWX)': {
    compound_code_for_global_code('87G7PX7R+3PWX')}""")
print(f"""compound_code_for_global_code('87G8V36G+92)': {
    compound_code_for_global_code('87G8V36G+92')}""")

bilst avatar Sep 19 '22 22:09 bilst

Re algorithm description, Nominatim or another geocoder can be used

Hi @bilst , thanks the Python descriptioin. You transtaled my FlowChart into a global_code_for_compound_code() Python function. Both, my FlowChart and your Python stub, describes exactly the same algorithm... And the same blackbox.

In your function the blackbox name is geocode(place_name) that returns a reference_LatLon.

PS: ~2.5 years ago I mentioned here the recoverNearest() function and a simple CSV dataset, but no one understood, so nowadays I prefer the FlowChart to explain the blackbox problem.

We need a big dataset, but we have open data like OpenStreetMap, so it is not impossible... To "emulate" PlusCodes it is enough to reduce the scope to a small country, for example Cape Verde, where the first doubts arose.

ppKrauss avatar Sep 19 '22 23:09 ppKrauss

Using that API is not always free (as in beer), but it exists and can be used.

A beer, plus a beer, plus another ... Doesn't look like much, just like this sum.
It doesn't look like it will grow, but the sum is infinite, despite intuition saying otherwise.

Although Google sellers have good arguments, we prefer to use a reproducible algorithm, which does not make us hostage to our intuition. Nor hostages of the service provider: reproducibility is just a guarantee that the beer will remain good and reasonably priced in the future.


Regarding your second questions, I believe such dataset, while easy enough to describe, is not actually that trivial to generate or maintain (...)

I commented above that we can reduce the scope to a small country. It is important that we can reproduce PlusCodes and discuss the whole algorithm, not just OLC... Well, is important if you want to say that PlusCodes is FOSS, an open software... I will try to reinforce and explain this point better.

... More than forty five years ago N. Wirth showed that "algorithms + data structures = programs".
Today, in Data science and Geoprocessing, we can add datasets in the equation. So a free program, that is a FOSS software, is not 100% open if data structures and (good) sample datasets are not also open. We need datasets to test and to show that the algorithm is reproducible.

The PlusCodes presentation page say that the code is open, but only OLC is 100% open.
The PlusCodes technology is comprised of OLC + name_resolution.

ppKrauss avatar Sep 20 '22 01:09 ppKrauss

My earlier point regarding the information you call the dataset was not meant to convey that it is simply "too big" to be shared after being generated from some map - but simply that this dataset on a global scale and across time is so complex that it is the map (or, rather, the online API that always gets you up-to-date information).

For what it's worth, it is not true that we can just reduce scope to some islands in the Atlantic Ocean and pretend that whatever conclusions we reach based on that can be extended to any globally working implementation of Open Location Code with shortening, such as Google's plus codes. For example, if you are just interested in short codes working for Cape Verde, you're already getting two digits for free, because all of Cape Verde exists in 79000000+. You might be able to get the other two digits with a very small lookup table with size ~ number of islands, because each individual island seems to be about the right size for that to work.

However, just because "V3FC+ São Vicente" is a code that would work perfectly well in the context of Cape Verde, it doesn't mean that it would work globally - because there's a ton of places with that name.

It is also not true that you need Google's exact plus codes dataset to be able to test and show reproducibility of the Open Location Code algorithm. For that, you just need some dataset that you might as well get elsewhere, or create yourself.

Now, if you actually want to argue that "Google Plus Codes" is not the same and/or not as free as "Open Location Code" because of short code compatibility issues, you can of course do that and might even have a point under specific circumstances. I'd just say that doing it in the OLC repository by asking for an algorithm to be used with data from anywhere else than Google Maps is neither the best nor the most honest way to do so if you are actually looking for something completely different.

bocops avatar Sep 20 '22 08:09 bocops

From (newly updated) https://github.com/google/open-location-code/wiki/FAQ#reference-location-dataset : The open source libraries support conversion to/from addresses using the latlng of the reference location. Callers will need to convert place names to/from latlng using a geocoding system.

Providing a global dataset isn't within scope of this project. For a potential free alternative, see Open Street Map and derived geocoding service Nominatim.

bilst avatar Oct 05 '22 16:10 bilst