dice icon indicating copy to clipboard operation
dice copied to clipboard

GEORADIUSBYMEMBER command

Open benbarten opened this issue 1 year ago • 21 comments

Implementation of the GEORADIUSBYMEMBER command

  • [x] Basic functionality to query all members within the radius of a given member of a key
  • [x] Add support for WITHDIST, WITHHASH, WITHCOORD, COUNT parameters
  • [x] Replaced dependency github.com/mmcloughlin/geohash with a redis-like implementation of geohash to make results comparable
  • [x] Add support for STORE, STOREDIST parameters
  • [x] Add unit and integration tests
  • [x] Benchmark and potentially optimize the code

Resolves #1252.

benbarten avatar Nov 18 '24 07:11 benbarten

Hi @benbarten. I have pushed validation logics for parsing optional parameters, fixed bugs in COUNT logic and updated response structure when WITHHASH/WITHCOORD/WITHDIST are used. Please review the changes.

And geohash area filter is returning elements that are outside the specified radius as well. Below example with a 100 km radius filter. But member at a 134 km distance is also being returned. Can you please look into it?

  • Dice
127.0.0.1:7379> GEOADD Sicily 13.583333 37.316667 "Agrigento"
(integer) 1
127.0.0.1:7379> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania"
(integer) 2
127.0.0.1:7379> GEORADIUSBYMEMBER Sicily Agrigento 100 km WITHDIST
1) 1) "Agrigento"
   2) (integer) 0
2) 1) "Palermo"
   2) "90.97804072951635"
3) 1) "Catania"
   2) "134.46977809328365"
  • Redis
redis> GEOADD Sicily 13.583333 37.316667 "Agrigento"
(integer) 1
redis> GEOADD Sicily 13.361389 38.115556 "Palermo" 15.087269 37.502669 "Catania"
(integer) 2
redis> GEORADIUSBYMEMBER Sicily Agrigento 100 km WITHDIST
1) 1) "Agrigento"
   2) "0.0000"
2) 1) "Palermo"
   2) "90.9778"

Thanks.

c-harish avatar Nov 23 '24 09:11 c-harish

Hey @c-harish,

Thank you for pointing out those issues. I should have tested this more thoroughly beforehand. The implementation has a couple of different issues which I resolved in my last commit:

  1. The github.com/mmcloughlin/geohash library we used calculates geohashes in a latitude range from -90 to 90 while redis does not accept values closes to the poles. This resulted in different hashes and made it not only hard to compare between redis and dicedb. I replaced this library with a redis-like implementation of geohashes.
  2. Geohashes are not linear, so range queries on a sorted set can return values which are higher than the given radius. I added a post-filter step as in redis to filter out any points which exceed the radius, but match the geohash area.
  3. I updated the GetMemberScoresInRange method of the sorted set to a half-open interval as in redis.

Please, feel free to test again. How should we split up the remaining tasks?

benbarten avatar Nov 27 '24 06:11 benbarten

@c-harish I would continue working on the STORE and STOREDIST options.

benbarten avatar Nov 27 '24 07:11 benbarten

Tremendous job, @benbarten. 🚀 Looks solid to me, at first glance.

c-harish avatar Nov 27 '24 09:11 c-harish

@c-harish I looked into the implementation of STORE and STOREDIST this morning and I will need a few more days to finalize it.

STORE and STOREDIST store the result of the GEORADIUSBYMEMBER command for a specified key. Hence, they turn the GEORADIUSBYMEMBER command into a MultiShard command that needs to be decomposed. Since this command can be a single or a multi-shard command depending on the given options, there is no out-of-the-box way to handle this in the code.

My idea is to decompose the command into a GEORADIUSBYMEMBER command as it is right now and a GEOADD command. This will be similar to how the COPY command is implemented. The crux here is that only if those args are given we want to do a preprocessing step. I need to think about a nice way to implement this without convoluting the existing logic. I'll get back here once I figure that out.

benbarten avatar Nov 28 '24 07:11 benbarten

Hi @benbarten, I don't think we need to decompose the command or preprocess responses here, since GEOADD is a singleshard command and GEORADIUS commands also deals with a single key (so all data we need is also essentially from a single shard). We just store the response again in a single shard, if STORE options are set.

@AshwinKul28, Can you please confirm?

c-harish avatar Nov 28 '24 10:11 c-harish

As far as I understand, the shard is determined by the key.

The STORE and STOREDIST options take the result of GEORADIUSBYMEMBER and store it for a different key. Hence, we would need to access a different shard than the one that executes GEORADIUSBYMEMBER. e.g. the COPY command is similar in this regard since it takes two keys and moves data between them.

Maybe my understanding is wrong here, but I tested locally to store data for a different key directly using the Store object in evalGEORADIUSBYMEMBER and I wasn't able to then retrieve the stored set by using the new key.

benbarten avatar Nov 28 '24 16:11 benbarten

Added the unit tests this morning. Will do benchmarking tomorrow and then we can decide whether to merge like this or add the STORE and STOREDIST options as part of this PR.

benbarten avatar Nov 29 '24 07:11 benbarten

Benchmarks

Base command

Running tool: /opt/homebrew/bin/go test -benchmem -run=^$ -bench ^BenchmarkGEORADIUSBYMEMBER$ github.com/dicedb/dice/internal/eval

goos: darwin
goarch: arm64
pkg: github.com/dicedb/dice/internal/eval
cpu: Apple M2
=== RUN   BenchmarkGEORADIUSBYMEMBER
BenchmarkGEORADIUSBYMEMBER
=== RUN   BenchmarkGEORADIUSBYMEMBER/Small
BenchmarkGEORADIUSBYMEMBER/Small
BenchmarkGEORADIUSBYMEMBER/Small-8                462044              2393 ns/op           1048 B/op          39 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER/Medium
BenchmarkGEORADIUSBYMEMBER/Medium
BenchmarkGEORADIUSBYMEMBER/Medium-8               106651             27096 ns/op           4920 B/op         100 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER/Large
BenchmarkGEORADIUSBYMEMBER/Large
BenchmarkGEORADIUSBYMEMBER/Large-8                 26185            214080 ns/op          27480 B/op         352 allocs/op
PASS
ok      github.com/dicedb/dice/internal/eval    10.258s

Different Radii

Running tool: /opt/homebrew/bin/go test -benchmem -run=^$ -bench ^BenchmarkGEORADIUSBYMEMBER_DifferentRadii$ github.com/dicedb/dice/internal/eval

goos: darwin
goarch: arm64
pkg: github.com/dicedb/dice/internal/eval
cpu: Apple M2
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentRadii
BenchmarkGEORADIUSBYMEMBER_DifferentRadii
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentRadii/SmallRadius
BenchmarkGEORADIUSBYMEMBER_DifferentRadii/SmallRadius
BenchmarkGEORADIUSBYMEMBER_DifferentRadii/SmallRadius-8                    38484            30515 ns/op     1288 B/op         47 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentRadii/MediumRadius
BenchmarkGEORADIUSBYMEMBER_DifferentRadii/MediumRadius
BenchmarkGEORADIUSBYMEMBER_DifferentRadii/MediumRadius-8                   41575            28804 ns/op    13992 B/op        197 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentRadii/LargeRadius
BenchmarkGEORADIUSBYMEMBER_DifferentRadii/LargeRadius
BenchmarkGEORADIUSBYMEMBER_DifferentRadii/LargeRadius-8                    15754            76014 ns/op    95817 B/op       1470 allocs/op
PASS
ok      github.com/dicedb/dice/internal/eval    5.211s

Different options

Running tool: /opt/homebrew/bin/go test -benchmem -run=^$ -bench ^BenchmarkGEORADIUSBYMEMBER_DifferentOptions$ github.com/dicedb/dice/internal/eval

goos: darwin
goarch: arm64
pkg: github.com/dicedb/dice/internal/eval
cpu: Apple M2
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentOptions
BenchmarkGEORADIUSBYMEMBER_DifferentOptions
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentOptions/NoOptions
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/NoOptions
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/NoOptions-8                   259896             3899 ns/op     3848 B/op         76 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithCoord
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithCoord
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithCoord-8                   303004             3948 ns/op     4080 B/op         80 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithDist
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithDist
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithDist-8                    297511             4064 ns/op     4056 B/op         79 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithHash
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithHash
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/WithHash-8                    299065             3982 ns/op     4064 B/op         80 allocs/op
=== RUN   BenchmarkGEORADIUSBYMEMBER_DifferentOptions/AllOptions
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/AllOptions
BenchmarkGEORADIUSBYMEMBER_DifferentOptions/AllOptions-8                  294674             4034 ns/op     4152 B/op         82 allocs/op
PASS
ok      github.com/dicedb/dice/internal/eval    6.275s

benbarten avatar Dec 01 '24 18:12 benbarten

Hi, @apoorvyadav1111. Can you please trigger CI check and review? Thanks.

c-harish avatar Dec 01 '24 18:12 c-harish

Hey @apoorvyadav1111,

any update on the review? I'm keeping it up-to-date with the latest changes, but let us know if we should request another reviewer :) We could unblock @c-harish and also add the STORE/STOREDIST options.

benbarten avatar Dec 08 '24 18:12 benbarten

Hi @benbarten! Thanks for all the work on this feature, looks like its close to completion. Apologies for the delayed reviews, the team is now returning from vacation and we've started reviewing the PRs again.

In the meantime could you please resolve the merge conflicts that seem to be happening?

JyotinderSingh avatar Jan 07 '25 10:01 JyotinderSingh

Hey @benbarten , are you working on it actively? Asking it since read only part of command is blocked on this PR.

iamskp11 avatar Jan 21 '25 04:01 iamskp11

Hey @JyotinderSingh ,

sorry for the late response here as well. I got caught up at work after new year and missed the notification. I'm picking the work on this up again by resolving the merge conflicts. Afterwards, I'll create a follow up issue to implement this enable the STORE and STOREDIST options by making this a multi shard command.

benbarten avatar Jan 30 '25 19:01 benbarten

Hey @JyotinderSingh,

I have resolved the merge conflicts. As described above, I still need to implement the STORE and STOREDIST options. Based on my understanding, they will require to make this command a multi-shard command, since both options take the command's output and store it in a new sorted set with a new key.

I will implement this in the upcoming days. In the meantime, feel free to merge this part already, as it will unblock @iamskp11 on his work.

benbarten avatar Jan 31 '25 19:01 benbarten

Hey @JyotinderSingh,

I have resolved the merge conflicts. As described above, I still need to implement the STORE and STOREDIST options. Based on my understanding, they will require to make this command a multi-shard command, since both options take the command's output and store it in a new sorted set with a new key.

I will implement this in the upcoming days. In the meantime, feel free to merge this part already, as it will unblock @iamskp11 on his work.

Hi @benbarten, thanks a lot for picking this up again! We're in the middle of another large refactor, which has changed the architecture of the database quite a bit (and makes it a lot faster and memory efficient). You may want to catch up on somw of those changes. Thanks again!

JyotinderSingh avatar Feb 04 '25 12:02 JyotinderSingh

Hey @JyotinderSingh,

I already read about the ironhawk engine on discord, but haven't had time to look at how to migrate existing commands.

I'm close to finishing the STORE/STOREDIST options in presumably the old way. Once I'm done, I'll have a look at the new engine and try to migrate the current implementation.

benbarten avatar Feb 05 '25 19:02 benbarten

Hey @JyotinderSingh,

I already read about the ironhawk engine on discord, but haven't had time to look at how to migrate existing commands.

I'm close to finishing the STORE/STOREDIST options in presumably the old way. Once I'm done, I'll have a look at the new engine and try to migrate the current implementation.

awesome, thank you!

JyotinderSingh avatar Feb 06 '25 01:02 JyotinderSingh

Hey @JyotinderSingh,

I have now implemented the STORE/STOREDIST options with the old engine. This made the GEORADIUSBYMEMBER command a multi-shard command, but also caused a couple of new merge conflicts due to the recent refactor. I had a quick look at the new ironhawk engine and how to port commands to it. It seems there is no way to support multi-shard commands yet.

Do you want me to implement it as part of this PR?

benbarten avatar Feb 06 '25 08:02 benbarten

Hey @JyotinderSingh,

I have now implemented the STORE/STOREDIST options with the old engine. This made the GEORADIUSBYMEMBER command a multi-shard command, but also caused a couple of new merge conflicts due to the recent refactor. I had a quick look at the new ironhawk engine and how to port commands to it. It seems there is no way to support multi-shard commands yet.

Do you want me to implement it as part of this PR?

Thanks for the update @benbarten. We're in the process of implementing a multi-shard strategy. I would recommend making the port once it's in place. I'll ping on this PR once it's ready.

JyotinderSingh avatar Feb 06 '25 09:02 JyotinderSingh

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: benbarten
:x: harish-sa
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Mar 20 '25 11:03 CLAassistant