mobile-sdk icon indicating copy to clipboard operation
mobile-sdk copied to clipboard

[FR] better offline geocoding solution

Open farfromrefug opened this issue 4 years ago • 23 comments

Right now the geocoding packages generated using https://github.com/nutiteq/mobile-sdk-scripts are not very useful. They often dont return what you expect, are missing a lot of data, and dont support auto complete.

It would be best to be able to have something like photon https://photon.komoot.io/ Though we cant actually use photon as it is not in C and depends on elasticsearch which is also not in C. Maybe we could look at how organicmaps does it https://github.com/organicmaps/organicmaps/tree/master/search

farfromrefug avatar Jul 06 '21 18:07 farfromrefug

found typesense which seems pretty good https://github.com/typesense/typesense

farfromrefug avatar Jul 06 '21 20:07 farfromrefug

Can you give specific examples? In principle, autocomplete should work, but it needs to be switched on via API (setAutocomplete in geocoding service class). Regarding missing data, it could be either missing due to import scripts or missed during lookup. So if you could give some trivial examples and give access to your packages, I could take a closer look.

Regarding 3rd party solutions, if there were any complete solutions available, we could take a closer look. The non-trivial requirements are:

  • OSM import pipeline
  • possibility to split up geographical areas (individual packages) and then merge them during runtime
  • relatively compact search indices
  • C/C++ API, actively maintained

mtehver avatar Jul 13 '21 07:07 mtehver

@mtehver i did not know about setAutocomplete will try with it. But generaly it does not find what i am looking for (restaurant , peaks). Will try to share more details

Now about 3rd party solutions i agree with your points. I found that too last week https://github.com/dunkelstern/osmgeocoder. Seems easy to test, split by packages... The only issues i see:

  • we need to translate the postgres to sqlite. ogr2ogr should be able to do that with spatialite.
  • there is no C/C++ api it is written in pyton. However it is only a few lines of sql queries.

Will try and see the sqlite size and compare it with our current packages.

farfromrefug avatar Jul 13 '21 07:07 farfromrefug

If you are interested in specific POIs, then I suggesting looking at 2 mapping files in mobile-sdk-scripts repo:

  • https://github.com/nutiteq/mobile-sdk-scripts/blob/master/data/osm_category_map.json
  • https://github.com/nutiteq/mobile-sdk-scripts/blob/master/data/osm_tag_list.json

It seems that peaks are simply missing from these files. These mapping files are from Pelias project. Pelias was geocoder project from MapZen, that we originally used for import pipeline.

Regarding third party solutions, we are only looking for ones that do not need porting and maintenance from our side. Also, spatialite as a dependency is a blocker for us, due to its size.

mtehver avatar Jul 13 '21 07:07 mtehver

@mtehver yes i already changed those files. I have even rewritten them in a new format to keep osm original tags for categories. Now i understand your point about 3rd parties solutions. So i run more tests on our current solution:

  • autocomplete does not seem to work.
  • we dont have fuzziness which would be very good to have
  • i see that most POIs dont get address. Should the wof database give address to those POIs?

EDIT: this shows what i am talking about. I search for bd stock while around grenoble/france. The result (seen at the top of the screen) does not have street/housenumber. But i can get it by using reverse geocoding (same nutigeodb) as seen at the bottom of the screen. BTW it seems house numbers are not in the nutigeodb. Should nt we add them ? Screenshot_1626188822

farfromrefug avatar Jul 13 '21 15:07 farfromrefug

Could you share your imported database file and give few other failing/suspicious search results?

Regarding POI missing addresses. At first glance, I would suspect the OSM address tags are missing, but then reverse geocoding should also not work. So no idea what is happening, need to check this.

House numbers are present in the database, but they are binary encoded in 'housenumbers' field of 'entities' table. Multiple house numbers are packed into one row.

mtehver avatar Jul 14 '21 11:07 mtehver

@mtehver so :

  • was wrong about autocomplete. It works! was on my side.
  • here is an example nutigeodb. It is build with my fork. Though it would only remove unused data. should not make a difference in the issues we are talking about like missing address
  • about POI and missing address tag. You are totally correct. I was thinking the address was still set while importing in the script (using reverse geocoding).
  • i see a lot of duplicates in the names table. They should be unique for smaller db (will try and create a PR). EDIT: seems like it is because they have different type, so pretty normal.
  • i see an issue with names like L'impertinence. You cant find them by typing impertinence while it works if the name is L impertinence. Even with autocomplete.
  • i was wrong about housenumbers. It works! the issue is on my side.

farfromrefug avatar Jul 14 '21 13:07 farfromrefug

Thanks for the sample, I will do some testing later today.

mtehver avatar Jul 15 '21 07:07 mtehver

@mtehver by any chance would you a direct link to the carto nutigeodb for europe/france/rhone_alpes? would like to compare it to mine.

farfromrefug avatar Jul 15 '21 09:07 farfromrefug

@farfromrefug You can find it here: https://storage.googleapis.com/mobile_nutiteqosmtiles/carto-streets-geocode/data/2/FR-V.nutigeodb

But it is really old, so not sure it is very useful.

mtehver avatar Jul 16 '21 08:07 mtehver

@farfromrefug I just updated geocode package generation scripts. Now full address info should be available for POIs plus L' prefix should be handled better. Have not had time to test these changes, though.

mtehver avatar Jul 16 '21 10:07 mtehver

@mtehver awesome will test it ! Just concerning the skip token of l' I think the issue should be generalized. We could get:

  • Rue de l'amadou
  • Rue d'amadou
  • ... Should'nt we just replace all occurences of ['"»”‹›] with a space?

farfromrefug avatar Jul 16 '21 12:07 farfromrefug

@mtehver i am working on handling '"... i think i know how to do it. Your change correctly adds POI addresses, just does not work for some. Not sure why yet. Thank you.

Found another issue which makes the results weird. The rank does not take the distance to your "search" so you get weird ranking. Like this is the result of photon: Screenshot_1626447638 And this is the result for the carto geocoding. You would expect Grenoble results to be first Screenshot_1626447056

farfromrefug avatar Jul 16 '21 15:07 farfromrefug

@farfromrefug Are you setting 'location' attribute in RoutingRequest? Assuming you do, there are 2 custom parameters that may need tweaking (via setCustomParamter method). The first one is 'ranking.location_sigma'. The actual ranking is calculated as normal distribution based on distance from request location to specific feature. The formula is something like this: exp(-0.5*(distance_from_location_to_feature/location_sigma)^2). The default value is 100km (100000.0), but you may want to change it based on your zoom level. For example proportional to 6400*100*2*pi/2^zoom.

The second custom parameter you may want to tweak is 'ranking.location_weight'. This gives relative weight of distance based rank to matching rank. The default value is 1.0.

mtehver avatar Jul 19 '21 13:07 mtehver

@mtehver awesome love the fact i can tweak it all ! will report

i used a different approach for handling ' and " https://github.com/farfromrefug/mobile-sdk-scripts/commit/9863c9ddc5ef05b41009ebb634980f337f91a129 It works really well. Now i found more issues with the autcomplete:

  • greno return Grenoble : Good
  • impert does not return anything. Should report impertinence, right? (impertinence reports it)

farfromrefug avatar Jul 19 '21 13:07 farfromrefug

@mtehver have you looked at FTS3/4 https://www.sqlite.org/fts3.html ? It seems to be much faster and gives fuzzy matching. FTS4 seems to bring even more features if used with a spellfix1 table https://www.sqlite.org/spellfix1.html

farfromrefug avatar Aug 22 '21 19:08 farfromrefug

@farfromrefug Thanks for the link. It would likely provide better fuzzy matching for tokens but would be a bit slower when integrated. The reason is that token/word matching is only the first step in geocoder. In geocoder tokens are grouped into names and names are matched against entities (addresses, streets, states, etc). For longer names there are probably hundreds queries to the database.

mtehver avatar Aug 23 '21 07:08 mtehver

@mtehver i built a geocoding package for the whole france country. It is 1.5.gb. The thing is that with such a file the seach queries are dead slow. Any to make it faster? Here is the link to it if you want to test. I was thinking i could use the PackageManager but i use it offline (from a local directory). Is that currently supported?

farfromrefug avatar Oct 23 '21 11:10 farfromrefug

@farfromrefug Is it slow during 'warm up' phase (few initial requests) or stays slow afterwards? Perhaps county-level packages could improve the performance when used with 'location radius' limit (this can be set in GecodingRequest).

I am not sure I fully understand the offline question. It is possible to 'import' already loaded file into PackageManager database via startPackageImport method.

mtehver avatar Oct 26 '21 10:10 mtehver

@mtehver it seems to be slow all the way. BTW forgot to put the link to the file http://gofile.me/4pKGL/5SnOrpgI5

My offline question is that is there a way to use PackageManager with local files (on the sdcard for example)? That way i could split the file in France regions to get smaller nutigeodb files which could make requests faster? I looked at startPackageImport but to be honest i dont understand what it is importing. Is it "loading" local file to make them available through the PackageManagerGeocodingService ?

farfromrefug avatar Oct 26 '21 13:10 farfromrefug

@farfromrefug startPackageImport will copy local file into package manager folder and update package manager database. Once the import is complete, the original file can be deleted and the package will work the same as 'downloaded' packages. For example, for geocoding packages, PackageManagerGeocodingService can be used.

mtehver avatar Oct 27 '21 07:10 mtehver

@mtehver ok i get it now. Maybe dumb but couldnt we use local db instead of copying them? (the reason is also for it to remain on sdcard as you most always have more space there). I might have to create a new "PackageManager" which works with local packages structure

farfromrefug avatar Oct 27 '21 08:10 farfromrefug

@mtehver i am still thinking about this one. I want to point out a project i am following closely https://github.com/meilisearch/milli. They splited the code of mellisearch so that it can used as a lib directly inside apps. they even have a example with geo data. I might try to run some tests to see:

  • how well it works
  • how fast it works
  • size of the "database"

farfromrefug avatar Apr 11 '22 14:04 farfromrefug