immich icon indicating copy to clipboard operation
immich copied to clipboard

feat(server): significantly improve Australian reverse geocoding accuracy

Open lhjt opened this issue 1 year ago • 10 comments

Description

Many Australian suburbs are recorded as PPLXs instead of PPLs in the cities500 geonames dataset.

The current implementation of Immich specifically excludes PPLXs. This leads to the majority of photos taken in Australia having a reverse geocoded suburb that is completely off.

This PR allows PPLXs only from Australia to be ingested into the reverse geocoding table.

How Has This Been Tested?

  • [x] Built the server and ran it on my own instance. Uploaded a photo taken in Australia and observed that the reverse geocoded suburb was much more accurate.

Checklist:

  • [x] I have performed a self-review of my own code
  • [x] I have made corresponding changes to the documentation if applicable

lhjt avatar Jul 26 '24 11:07 lhjt

Hey, thanks for this. Would you be able to add an e2e test in our e2e folder that tests a location in Australia that previously resolved incorrectly and now resolves as you would expect?

zackpollard avatar Jul 26 '24 11:07 zackpollard

I can try and have a look - I have a few test cases for coordinates here that I have gathered:

-33.85897705866313, 151.27849073027048 - The Rocks (should be Vaucluse)
-33.844817409674775, 151.28264632160358 - The Rocks (should be Watsons Bay)
-37.76573239917475, 144.7524531648833 - St Albans (should be Ravenhall)
-31.894346156789997, 115.75761710390464 - Trigg (should be Scarborough)

@zackpollard which e2e test suite should I add these cases to?

lhjt avatar Jul 26 '24 12:07 lhjt

I don't know if this is related, but I've been having the same issue with photos taken in Japan.

For example:

Komae, Tokyo is resolved from 35.5770, 139.5760

When in theory it should be something like Miyamae-ku, Kawasaki (or something else, Kawasaki).

Is there an easy way to test if this is the case?

danada avatar Jul 26 '24 12:07 danada

@danada, I have noticed the same issue with my photos from Japan, but Japan's zone structure seems to be a lot more complex than what is available in the cities500 dataset. From my understanding, Immich uses the geonames database cities500 (cities where population >= 500) and your coordinates map to the closest PPL (populated place) Komae.

If you check this link, are any of the items in the blue box (when you click on it) close to what you would expect to appear as the name? None of them seem to list Kawasaki, but the closest zone Arima seems to match Google maps.

lhjt avatar Jul 26 '24 14:07 lhjt

@lhjt thanks for that link.

Arima is definitely more accurate than Komae which is in a different prefecture (whereas Arima is an adjacent town(?).

Fwiw, Google Photos identifies the location as just Kawasaki which is really coarse, whereas Arima (a town in Miyamae-ku, which is a ward of Kawasaki) is much finer, but not completely accurate.

danada avatar Jul 26 '24 14:07 danada

@danada, no problem. Based on the coordinates you provided though it appears that they are located within Arima(?), at least based on the boundaries that geonames has recorded for it:

Image

20240727T010157-Arc-GeoNames org

20240727T010635-Arc-35°34'37 2N 139°34'33 6E

If this is wrong, then it might be due to geonames having an incorrect boundary for the PPL. Even though this may be correct, unfortunately it will not be fixable by adding in JP to the check in this PR. This is because Arima does not seem to have a recorded population in the geonames database, which means it will not be included in the cities500 dataset.

My hunch is that to get around this, the user would have to upload a custom dataset to the /build/geodata/cities500.txt file that includes further PPLs to get an accurate geocoding, unless the Immich project is willing to replace the dataset with one that has more PPLs (not limited by population size, for example).

lhjt avatar Jul 26 '24 15:07 lhjt

Ah yeah, I noticed a bit of a difference with the area provided by cities500. Here's the correct boundaries from Google Maps (hard to compare, but the shape difference should be obvious)

Screenshot_20240727-001002~2.png

Perhaps the geometry can't be too complex in the dataset.

Either way, it's good to know more about how the geolocation works in immich :)

danada avatar Jul 26 '24 15:07 danada

Our current implementation doesn't work on boundaries, it works based on coordinates, so it's probably that you are closer to the center of the other place than the actual one you're in 😅

zackpollard avatar Jul 26 '24 15:07 zackpollard

Ooh thanks for confirming - guess that makes sense given there are only coordinates stored in the database. Is immich considering moving to a boundary based approach?

lhjt avatar Jul 26 '24 15:07 lhjt

Ooh thanks for confirming - guess that makes sense given there are only coordinates stored in the database. Is immich considering moving to a boundary based approach?

I've actually been working on that but had to put it down for a bit. It will likely land in one of the next few releases but it probably won't be the default. The data requirements for having polygons for the entire world is significantly higher, currently looking around 6-7GB in your postgres database.

zackpollard avatar Jul 26 '24 16:07 zackpollard