open-location-code Short codes with padding are unintentionally considered valid

The Open Location Code specification mentions that full codes with padding cannot be shortened, which would imply that a short code with padding is invalid. However, the API reference for code validation does not specify anything about allowing or disallowing short codes with padding. And as such, most (if not all) of the implementations will determine short codes with padding as being valid (and short), and even allow them to be recovered. Because of this gap in the specification there are inconsistencies and unexpected results between the different implementations and how short codes with padding are recovered.

Theoretically, recovering a short code with padding can sensibly result in a full code with the same padding. Some implementations (such as JavaScript) do maintain the padding when recovering shortened codes as one might expect. Whereas other implementations (such as Java) do not and always result in a full code with 8 digits (non-padded) where the digits that replace the padding are determined by the recovered coordinates which may be unexpected.

Example:

// Throws exception since specification disallows shortening padded codes
OpenLocationCode.shorten("8FWC2300+", 48.025, 8.075);

// JavaScript returns: "8FWC2300+"  (Padded 6 digit code)
// Java returns:       "8FWC23GG+"  (Non-padded 8 digit code as center of 6 digit code area)
OpenLocationCode.recoverNearest("2300+", 48.025, 8.075);

To resolve this issue, I think the following needs to be done:

The Open Location Code specification should be updated to define short codes with padding as either being explicitly disallowed (consistent with shorten() spec), or explicitly allow it if it makes sense and can reasonably be recovered/shortened.
Update the implementations to handle the change to short code validation and/or short code recovery (dependent on the decision to allow/disallow)
Update the test_data/validationTests.csv to include a padded short code to test that all implementations now conform to the updated spec and handle padded short codes appropriately.

I think the easiest and probably best option is to just explicitly disallow padded short codes to be consistent with disallowing shortening of padded full codes. This would also indirectly resolve these inconsistencies assuming that short code validation is fixed in each implementation.

Mar 24 '19 05:03 JonMcPherson

ping @drinckes

Mar 27 '19 01:03 JonMcPherson

Thanks for this, I'm going to take a look at it in the next couple of days.

Mar 27 '19 13:03 drinckes

This PR ^ resolves this issue

Mar 30 '19 03:03 JonMcPherson

@drinckes @zongweil Is this project still being maintained?

Apr 10 '19 00:04 JonMcPherson

Hey, yes, sorry for the delay Jon. I'm going to let @drinckes review this, as it involves changing the spec.

Apr 10 '19 04:04 zongweil

Yes, my fault. I'm going to send this today to the public mailing list as well as some other interested parties to give them a chance to provide feedback.

Anything we do will require a modification to the spec, even if it's just clarification, so what I propose is that I summarise the questions here, we get some feedback, and then proceed from there to a consensual decision.

Question: What should the spec say about:

Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+
Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+
Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

Recommendation

I recommend that the answer to all of these is yes, with the requirement that the resulting code always includes a minimum of two significant digits.

So for example, shortening 8FVC0000+ with relation to Zurich would return VC0000+. Shortening 8F000000+ with relation to Zurich will return 8F000000+ (because it must have two digits).

This basically says that you can shorten (or attempt to shorten) any code with relation to a location, and you the recover method will return the original code. The number of digits doesn't make this either work/not work.

Comments please! If I receive comments from interested parties without github accounts I'll post them here under my name.

Apr 10 '19 07:04 drinckes

I think a suggestion could be added to the spec that codes SHOULD NOT be padded and shortened at the same time: what's the use case for "VC0000+ Zurich", if that code describes an area much bigger than Zurich itself (~10% the size of whole Switzerland) and if the code isn't even shorter than the full code "8FVC0000+"?

That said, I don't see any reason for artificially removing the ability to recover a short code with padding while that is strictly possible.

It should just be ensured that padding characters are kept intact while recovering a full code. @JonMcPherson mentioned that the Java implementation recovers short code "2300+" to "????23GG+". Inflating a code's precision like that seems like a bug to me.

This bug is currently shared by maps.google.com, by the way. Searching for "8FVC0000+" on Maps leads to a map pin at "GG22+22, Niederglatt, Switzerland".

Apr 10 '19 08:04 bocops

Yea after giving it more thought, I think that the better option is to not restrict and to explicitly allow padded codes (with 2, 4, or 6 digits) to be shortened and recovered. I'm all for avoiding unnecessary exceptional cases and having one less precondition or thrown exception. Basically since the resolution of the code can be maintained when shortening and recovering, then a code of any resolution should be allowed to be shortened and recovered.

However, I think if the API spec for shorten() is going to modified, we should consider addressing this other issue which I think is valid and relevant to this issue.

The API spec states:

If the code cannot be shortened, the original full code should be returned.

This makes the return type ambiguous as it can be either a short or full code. Because of this ambiguity, the caller has to know to check if the returned code isShort() or isFull() (or is equal to the original full code). I think that any ambiguity in whether a code is short or full should be avoided because a short code cannot be decoded and can only be recovered. In other words, short codes cannot be used unless they are recovered, and so they should be dealt with and even represented separately from full codes.

This is why in the v2.0 release of my C# implementation, I refactored the class structure so that OpenLocationCode accepts only full codes, and a separate OpenLocationCode.ShortCode class was added which accepts only short codes. This approach avoids any ambiguity and the need for throwing InvalidOperationException such as when calling decode() on an OpenLocationCode that is a short code.

Additionally, there are inconsistencies in how the different implementations handle this. The Java implementation does not follow the spec and actually throws an IllegalArgumentException stating that the reference point is too far from the center of the code area when the code couldn't be shortened. Whereas most other implementations follow the spec and just return the original full code when it couldn't be shortened.

I think that if shorten() couldn't shorten the code, then it should communicate that to the caller in some way. Whether that be by throwing an exception or just simply returning null, I think that is better than returning a full code from shorten() making it ambiguous and even confusing. Maybe language features such as Optional types can be used for some of the implementations to better represent that the shorten() method will attempt to shorten the full code and thus a returned short code is optional.

Apr 11 '19 05:04 JonMcPherson

Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+ Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+ Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

In my opinion, the answer is yes to all three, and without restriction (no minimum significant digits requirement).

For the reason that they are still meaningful, even if they may not always be considered particularly useful. E.g. 0000+, San Francisco can be recovered to a valid plus code, so why restrict it unnecessarily? I think in general that the spec should focus on the algorithm (objectively: can this code be meaningfully resolved) and avoid making usability judgments (subjectively: do we think people should actually use this form). There may be future use-cases we have not thought of, and restricting them unnecessarily doesn't add much value to me.

Case in point: I really like that the spec today doesn't set short codes to be say "exactly 6 digits" (e.g. RG9C+VH). While this is the common size of shortened code we see (as it maps nicely to greater city areas), other use-cases for even shorter codes, say at a neighborhood level (e.g. 9C+VH), are still viable. This flexibility allows people to innovate on top of plus codes, taking advantage of their useful inherent properties.

I also prefer this approach as it keeps the spec simpler and with less edge cases.

Apr 19 '19 16:04 WilliamDenniss

@WilliamDenniss

If the proposal is that it is acceptable to remove all significant digits from these codes, then is it also acceptable to remove all significant digits from 8FVC9G8F+6W, or the first 10 digits from 8FVC9G8F+6WG?

If yes, then we will need to decide how to represent these currently invalid strings (00000000+00G or just +00G).

If no, then shortening padded codes is introducing a new edge case anyway.

Shortening the codes is done to make them easier for people to use in the context of addressing, by reducing the number of digits someone has to remember.

I think we risk defining a mathematically elegant but not practically useful solution here.

Apr 21 '19 09:04 drinckes

My approach would be similar to your comment on the uneven length codes: "be conservative in what you do, be liberal in what you accept from others", which I would apply in a way that "do" are the codes we generate in the API, and "accept" are codes that are considered valid according to the spec (which is possibly not the original intent of the phrase).

I don't honestly see the harm in accepting as valid a code that can be meaningfully resolved. As a straw-person, codes like +00G could even have some application for finding locations at a very small scale like rooms in a house, or smaller (though I'm not advocating for this use).

I totally agree that the API should not by default generate codes that are hard to use, I think this kind of specialty use would be left for experts to determine.

The API design for short codes already exhibit this principle. shorten will return short codes with an included safety margin, with usability in mind. But recoverNearest will just resolve the code to the nearest. This guides users to make the right decision but doesn't stop an expert user say further truncating short codes themselves when they are confident in their own application (i.e. if I'm generating a POI map of SF, I can by-pass shorten completely and just truncate by a fixed amount).

Apr 30 '19 16:04 WilliamDenniss

@JonMcPherson Hi, this question has nothing to do with this issue. I was trying to find a way to contact you to ask you regarding the parse live query for .NET and Unity. I downloaded your repo and placed them within my Assets folder but I tried to create a parse live query client and connecting it to the URI of the parse server (using the API Back4App) but when testing with a test callback I tried to log that I have entered the callback but it did not work. So I wanted to ask you, is there maybe something missing within the code and it is not a finished project yet or could I possibly be doing something wrong?

Jul 23 '20 10:07 Evilc06

Recommended tag: specification

Apr 25 '21 04:04 fulldecent

This issue, and #462, is fixed by https://github.com/google/open-location-code/pull/463

It is specified:

Plus Codes with code length less than 8 SHALL NOT be represented as short codes.
If both distances are less than 10 degrees, digit places 1–2 MAY be omitted.
If both distances are less than 0.5 degrees, digit places 1–2 or 1–4 MAY be omitted.
If both distances are less than 1/40 degrees, digit places 1–2, 1–4 or 1–6 MAY be omitted.

This incorporates existing specifications at https://github.com/google/open-location-code/blob/main/docs/specification.md#short-codes.

Jun 04 '21 18:06 fulldecent

Awesome. I'll update the C# implementation after the Java implementation is updated to use as a reference for consistency.

Jun 04 '21 19:06 JonMcPherson