feat(server): significantly improve Australian reverse geocoding accuracy
Description
Many Australian suburbs are recorded as PPLXs instead of PPLs in the cities500 geonames dataset.
The current implementation of Immich specifically excludes PPLXs. This leads to the majority of photos taken in Australia having a reverse geocoded suburb that is completely off.
This PR allows PPLXs only from Australia to be ingested into the reverse geocoding table.
How Has This Been Tested?
- [x] Built the server and ran it on my own instance. Uploaded a photo taken in Australia and observed that the reverse geocoded suburb was much more accurate.
Checklist:
- [x] I have performed a self-review of my own code
- [x] I have made corresponding changes to the documentation if applicable
Hey, thanks for this. Would you be able to add an e2e test in our e2e folder that tests a location in Australia that previously resolved incorrectly and now resolves as you would expect?
I can try and have a look - I have a few test cases for coordinates here that I have gathered:
-33.85897705866313, 151.27849073027048 - The Rocks (should be Vaucluse)
-33.844817409674775, 151.28264632160358 - The Rocks (should be Watsons Bay)
-37.76573239917475, 144.7524531648833 - St Albans (should be Ravenhall)
-31.894346156789997, 115.75761710390464 - Trigg (should be Scarborough)
@zackpollard which e2e test suite should I add these cases to?
I don't know if this is related, but I've been having the same issue with photos taken in Japan.
For example:
Komae, Tokyo is resolved from 35.5770, 139.5760
When in theory it should be something like Miyamae-ku, Kawasaki (or something else, Kawasaki).
Is there an easy way to test if this is the case?
@danada, I have noticed the same issue with my photos from Japan, but Japan's zone structure seems to be a lot more complex than what is available in the cities500 dataset. From my understanding, Immich uses the geonames database cities500 (cities where population >= 500) and your coordinates map to the closest PPL (populated place) Komae.
If you check this link, are any of the items in the blue box (when you click on it) close to what you would expect to appear as the name? None of them seem to list Kawasaki, but the closest zone Arima seems to match Google maps.
@lhjt thanks for that link.
Arima is definitely more accurate than Komae which is in a different prefecture (whereas Arima is an adjacent town(?).
Fwiw, Google Photos identifies the location as just Kawasaki which is really coarse, whereas Arima (a town in Miyamae-ku, which is a ward of Kawasaki) is much finer, but not completely accurate.
@danada, no problem. Based on the coordinates you provided though it appears that they are located within Arima(?), at least based on the boundaries that geonames has recorded for it:
Image
If this is wrong, then it might be due to geonames having an incorrect boundary for the PPL. Even though this may be correct, unfortunately it will not be fixable by adding in JP to the check in this PR. This is because Arima does not seem to have a recorded population in the geonames database, which means it will not be included in the cities500 dataset.
My hunch is that to get around this, the user would have to upload a custom dataset to the /build/geodata/cities500.txt file that includes further PPLs to get an accurate geocoding, unless the Immich project is willing to replace the dataset with one that has more PPLs (not limited by population size, for example).
Ah yeah, I noticed a bit of a difference with the area provided by cities500. Here's the correct boundaries from Google Maps (hard to compare, but the shape difference should be obvious)
Perhaps the geometry can't be too complex in the dataset.
Either way, it's good to know more about how the geolocation works in immich :)
Our current implementation doesn't work on boundaries, it works based on coordinates, so it's probably that you are closer to the center of the other place than the actual one you're in 😅
Ooh thanks for confirming - guess that makes sense given there are only coordinates stored in the database. Is immich considering moving to a boundary based approach?
Ooh thanks for confirming - guess that makes sense given there are only coordinates stored in the database. Is immich considering moving to a boundary based approach?
I've actually been working on that but had to put it down for a bit. It will likely land in one of the next few releases but it probably won't be the default. The data requirements for having polygons for the entire world is significantly higher, currently looking around 6-7GB in your postgres database.