mini-redis icon indicating copy to clipboard operation
mini-redis copied to clipboard

The server test key_value_timeout has a race condition causing hang

Open Hodkinson opened this issue 4 years ago • 6 comments

For some reason this doesn't happen on a Linux VM, but my laptop reproduces it consistently.

This hangs almost straight away on my machine: while cargo test key_value_timeout; do :; done

I added some prints and can see that when the test hangs, the background purge task was awakened at the time for the key to be purged, the key is purged, and then the request is made for the key, but it no longer exists as it just got purged. The test then hangs as read_exact is waiting for a longer response.

It seems to be related to the auto-advancing of the clock when paused, but I am unsure where to go from here to investigate further.

https://github.com/tokio-rs/tokio/blob/8f10d816131868030996b5079fbc632750e69332/tokio/src/time/clock.rs#L83-L84

Hodkinson avatar Jul 13 '21 19:07 Hodkinson

I remember reproducing this when you posted it, but I can't reproduce it anymore. It's possible that it was fixed via #76. Can you test it again?

Darksonn avatar Jul 20 '21 13:07 Darksonn

It still fails for me unfortunately.

I know there are reasons for the timer auto-advancing, but from the point of view of the test writer it's surprising. I would expect that once tokio::time::pause() is called in the test, the test is then fully responsible for controlling the advancement of the timer and avoiding deadlocks. Maybe a tokio::time::pause_no_advance() would be useful.

Hodkinson avatar Jul 20 '21 18:07 Hodkinson

This is still happening. My toolchain

➜  mini-redis git:(master) ✗ rustup show  
Default host: x86_64-apple-darwin
rustup home:  /Users/tonyx/.rustup

installed toolchains
--------------------

stable-x86_64-apple-darwin (default)
nightly-x86_64-apple-darwin

active toolchain
----------------

stable-x86_64-apple-darwin (default)
rustc 1.58.0 (02072b482 2022-01-11)

I tried to debugged it a bit and found the code hangs in waiting to read the "world" back. Under further investigation I realized that the key in the DB is gone so the server returned -1 and that's why the read_exact call hangs (expecting more bytes).

   stream.read_exact(&mut response).await.unwrap();

I was not able to come up with a solution but I would really appreciate if we can fix this. mini-redis is still the tutorial used by tokio official website. Newcomers would love to play around with it :) let's fix it!

Tony-X avatar Apr 01 '22 00:04 Tony-X

For the record, this test works if executed alone. It still hangs with --test-threads=1. So apparently tests within this file can affect each other.

Tony-X avatar Apr 01 '22 01:04 Tony-X