kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Memory issue of kvrocks2redis detected by ASan/TSan

Open Zakelly opened this issue 1 year ago • 12 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Motivation

Currently, the Kvrocks2redis CI fails with TSAN/ASAN or in macos env. An example for this: https://github.com/Zakelly/kvrocks/actions/runs/8344973371/job/22838786340 The value mismatch in destination redis server. More investigation is needed.

Solution

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

Zakelly avatar Mar 23 '24 15:03 Zakelly

Thank you @Zakelly ! The action log will be deleted by GitHub after some days. Can you download the log file and upload?

jihuayu avatar Mar 23 '24 22:03 jihuayu

@jihuayu Sure thing. Log attached. logs_21881009491.zip

Zakelly avatar Mar 24 '24 05:03 Zakelly

IMHO we should spend some time on investigating this issue and find a solution before next release. cc @git-hulk

PragmaTwice avatar Apr 20 '24 13:04 PragmaTwice

@PragmaTwice I may take some time to investigate this after resolving #2253, and it'd be great if other guys would like to dive deep into this issue.

git-hulk avatar Apr 20 '24 13:04 git-hulk

@PragmaTwice I may take some time to investigate this after resolving #2253, and it'd be great if other guys would like to dive deep into this issue.

I think this issue is related to https://github.com/apache/kvrocks/issues/1988 which makes kvrocks2redis crash. It is important to keep kvrocks2redis usable, without crashing.

PragmaTwice avatar Apr 20 '24 13:04 PragmaTwice

@PragmaTwice I may take some time to investigate this after resolving #2253, and it'd be great if other guys would like to dive deep into this issue.

@git-hulk Hi, have you started studying this core? If not I kind of want to try it.

AntiTopQuark avatar Apr 22 '24 03:04 AntiTopQuark

@AntiTopQuark Not yet, thank you!

git-hulk avatar Apr 22 '24 03:04 git-hulk

@Zakelly Hi, I've investigated this issue and successfully reproduced it on my computer.

I suspect it's not a memory issue, but rather that compiling with ASAN and TSAN significantly impacts the performance of kvrocks2redis. This leads to slower data synchronization in kvrocks2redis.

And a simple sleep(0.02)(url) in the check_consistency.py test script might not give Redis enough time to sync the corresponding results, resulting in execution errors.

As shown in the diagram below, although the test failed, the corresponding keys and values can still be found in Redis afterward. image

Or we can try to make the test script sleep longer?

AntiTopQuark avatar Apr 23 '24 10:04 AntiTopQuark

@Zakelly Hi, I've investigated this issue and successfully reproduced it on my computer.

I suspect it's not a memory issue, but rather that compiling with ASAN and TSAN significantly impacts the performance of kvrocks2redis. This leads to slower data synchronization in kvrocks2redis.

And a simple sleep(0.02)(url) in the check_consistency.py test script might not give Redis enough time to sync the corresponding results, resulting in execution errors.

As shown in the diagram below, although the test failed, the corresponding keys and values can still be found in Redis afterward. image

Or we can try to make the test script sleep longer?

cc @git-hulk

AntiTopQuark avatar Apr 24 '24 11:04 AntiTopQuark

@AntiTopQuark Thanks for your great investigation. Can we relax the check condition? like check it N times with a fixed interval and return once it's passed. cc @PragmaTwice What do you think?

git-hulk avatar Apr 24 '24 13:04 git-hulk

Yeah we can adjust 0.02 to see if it works.

PragmaTwice avatar May 05 '24 16:05 PragmaTwice

Happy to see the great investigation! Thanks @AntiTopQuark

Zakelly avatar May 10 '24 07:05 Zakelly