geocoder-nlp icon indicating copy to clipboard operation
geocoder-nlp copied to clipboard

Database format revision

Open rinigus opened this issue 2 years ago • 3 comments

As has been highlighted by issue eventually caused by Kyotocabinet (https://github.com/rinigus/osmscout-server/issues/419), Goecoder NLP depends on a library that is not maintained anymore (see message at https://dbmx.net/kyotocabinet/). So, it makes sense to look for alternatives.

Currently, we use 3 data files to store one region:

  • SQLite database with the main record data
  • MARISA https://github.com/s-yata/marisa-trie for text search
  • Kyotocabinet for linking MARISA IDs to corresponding records in SQLite db. Here, one MARISA ID is mapped to multiple record IDs in SQLite database, saved as : <binary string containing multiple >.

Over those, SQLite provides links between records forming hierarchies and spatial search index.

In principle, we can swap the full set.

rinigus avatar Feb 18 '23 11:02 rinigus

Kyotocabinet can maybe be replaced by just table in SQLite using the same format as in Kyotocabinet:

(primary index): BLOB string of ints

rinigus avatar Feb 18 '23 11:02 rinigus

Kyotocabinet replacement:

LMDB http://www.lmdb.tech/doc/

rinigus avatar Feb 18 '23 11:02 rinigus

Kyotocabinet replacement:

RocksDB: https://rocksdb.org/

Has spatial indexing as well, see https://rocksdb.org/blog/2015/07/17/spatial-indexing-in-rocksdb.html . Although, sounds as tuned for showing tiles.

rinigus avatar Feb 18 '23 11:02 rinigus