dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

is support geo commands like GEOADD、GEODIST

Open trj832000 opened this issue 1 year ago • 7 comments

Redis GEO command GEOADD、GEOPOS、GEODIST、GEORADIUS

trj832000 avatar Apr 25 '23 06:04 trj832000

not supported yet. why them and not tile38 ?

romange avatar Apr 25 '23 07:04 romange

@romange the biggest problem with tile38 is... try to use it on kubernetes, in production... unfortunately, is not too safe... The leader of tule38 is aways the tile38-0, if this pod gets down, you are lost... Here where i work, we did some tests with til38 on gke, after some weeks, on some pods rebalance, we lost a lot of messages because of pod leader get rebalanced :/ So, tile38 is not build for HA

bencz avatar May 07 '23 16:05 bencz

Dealing with geographic processing, geolocation indexing, and so on, is quite complex... Redis handles this in a very efficient way, using score division, but... if there are many elements in a sorted set, the search can become quite slow. Up to now, the most efficient geoprocessing system I've seen was from ArangoDB, which uses Google's s2 library.

bencz avatar May 07 '23 16:05 bencz

https://blog.nobugware.com/post/2016/geo_db_s2_geohash_database/

bencz avatar May 07 '23 16:05 bencz

@Pothulapati @ashotland - we just recently talked about how k8s update scheme does not fit stateful workloads.

romange avatar May 08 '23 08:05 romange

@bencz I know about s2. Can you elaborate about "score division"? how do you use sorted sets together with geo queries?

romange avatar May 08 '23 08:05 romange

@romange Sure! The command "GEOADD", creates a sortedset on redis, the GEOADD calculates a "geohash", the algorithm that redis uses, geohash is called score.

Here a sample image

The "member" is the "key" and the score is the geohash

Anyway.... about the geosearch ( most complex part ) One way to find nearby points based on geohashes is by using an ordered list of integer-based geohashes ( what the current redis do... ). To do this, you must determine the nearest neighbor geohash ranges for a specific geohash resolution that approximates your distance criteria. You can then query these ranges to get a list of nearby points.

To learn more about the method, visit this two link:

  • https://github.com/yinqiwen/ardb/wiki/Spatial-Index
  • https://github.com/yinqiwen/geohash-int ( this is the base algorithm, that Redis uses )
  • https://github.com/redis/redis/blob/unstable/src/geohash.c

Here are the key steps involved:

  • Store all geohashed points in an ordered set (i.e., zset in redis) in the resolution you prefer. This is usually a 64-bit integer or a 52-bit integer for javascript (if available), rather than the more common base32 geohashes.
  • Identify the bit depth/resolution that corresponds to your search radius. This value must be less than or equal to your stored geohash bit depth. You can use a table on the linked site to determine how the bit depth of a geohash corresponds to its bounding box area in meters.
  • Rehash your original coordinate at this lower resolution.
  • Find the 8 neighboring geohash areas at the lower resolution.
  • Add your coordinate's geohash from step 3 to the list.
  • Build a range of geohash values to search within, covering the 9 areas. You should have an array of 9 ranges, each with a lower limit and an upper geohash limit (18 geohashes in total) in that lower resolution from step 2.
  • Convert all 18 of these geohashes to the bit depth/resolution in which you have stored all your geohashes in your database, usually by bitshifting.
  • Perform a range query for points within these 9 ranges. Since there is no overlap, you only need to perform pure range queries, which are very fast (e.g., in redis: ZRANGEBYSCORE zsetname lowerLimit upperLimit over the 9 ranges produced in this step).

You can further optimize this process by:

  • Identifying where the 9 ranges from step 6 overlap and reducing them into 4 or 5 ranges to reduce query time.
  • Storing the final ranges for reuse.
  • Combining the queries into a MULTI/EXEC if you're using redis to improve performance.
  • Distributing steps 2-7 across clients instead of doing all the computation in one place to reduce CPU load in situations with high request volume.

To enhance accuracy, you can use a circle distance/haversine type function on the returned results.

Here's another technique using ordinary base32 geohashes and a SQL query instead of redis: https://github.com/davetroy/geohash-js.

If you're interested, there is a nodejs&redis module available that simplifies implementation. The code can be found here: https://github.com/arjunmehta/node-georedis.

Ps: I'm also doing some research using Redis + Uber H3, for that, I'm doing a mix between GEOSEACH + Uber's geoindexing, and I'm having a surreal, surprising result 😮

bencz avatar May 08 '23 17:05 bencz

Closing as completed via #1592 and #1594.

talbii avatar Aug 10 '23 19:08 talbii