redis
redis copied to clipboard
Dead lock if cancellation is requested while there is a PING request on the fly
I'm not sure but problem seems to be here.
If I am not mistaken you used close_on_run_completion
flag exactly to prevent this situation but I don't know why it leads to dead lock any way.
You can reproduce it with a loop like:
for (;;)
{
auto timer = asio::steady_timer{ executor, std::chrono::seconds{ 1 } };
co_await (connection.async_run(endpoint, asio::use_awaitable) || timer.async_wait(asio::use_awaitable));
}
if you change timer to something shorter than ping time like 800ms it will not lead to dead lock.
Can you please try what I said here https://github.com/mzimbres/aedis/issues/28#issuecomment-1255214596 and let me know if the problem persists.
I can't reproduce this behaviour. I have used this code https://github.com/mzimbres/aedis/issues/28#issuecomment-1256921527 with a timeout of 1s
and 2s
and it keeps printing retrying. A deadlock may occur for example when Redis for any reason sends unsolicited messages and you don't consume them, can you please add a coroutine to consume them, for example
net::awaitable<void> push_receiver(std::shared_ptr<connection> db)
{
for (std::vector<node<std::string>> resp;;) {
co_await db->async_receive_push(adapt(resp));
print_push(resp);
resp.clear();
}
}
Adding more connections increases the chance:
#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>
#include <aedis.hpp>
#include <chrono>
#include <iostream>
namespace net = boost::asio;
using namespace net::experimental::awaitable_operators;
using aedis::endpoint;
using connection = aedis::connection<>;
net::awaitable< void >
reconnect()
{
auto db1 = connection{ co_await net::this_coro::executor };
auto db2 = connection{ co_await net::this_coro::executor };
auto db3 = connection{ co_await net::this_coro::executor };
auto db4 = connection{ co_await net::this_coro::executor };
auto timer = net::steady_timer{ co_await net::this_coro::executor };
for (int i = 0;; i++)
{
timer.expires_after(std::chrono::milliseconds{ 1005 }); // Note extra 5ms
endpoint ep{ "127.0.0.1", "6379" };
co_await (db1.async_run(ep, net::use_awaitable) || db2.async_run(ep, net::use_awaitable) ||
db3.async_run(ep, net::use_awaitable) || db4.async_run(ep, net::use_awaitable) ||
timer.async_wait(net::use_awaitable));
std::cout << i << std::endl;
}
}
int
main()
{
net::io_context ioc;
net::co_spawn(ioc, reconnect(), net::detached);
ioc.run();
}
Does this trigger a sanitizer check on your machine?
Yes alloc-dealloc-mismatch
and use-of-uninitialized-value
with a gigantic error messages.
alloc-dealloc-mismatch.txt
use-of-uninitialized-value.txt
I have made many improvements in cancellation in master. I can't reproduce this problem anymore, so it might have been fixed.
Thanks for you work, I can still reproduce it with the following code: (delay 1 ms)
#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>
#include <aedis.hpp>
#include <chrono>
#include <iostream>
namespace net = boost::asio;
using namespace net::experimental::awaitable_operators;
using aedis::endpoint;
using connection = aedis::connection<>;
net::awaitable< void >
reconnect()
{
auto db1 = connection{ co_await net::this_coro::executor };
auto db2 = connection{ co_await net::this_coro::executor };
auto db3 = connection{ co_await net::this_coro::executor };
auto db4 = connection{ co_await net::this_coro::executor };
auto timer = net::steady_timer{ co_await net::this_coro::executor };
for (int i = 0;; i++)
{
timer.expires_after(std::chrono::milliseconds{ 1 }); // Note 1 ms
endpoint ep{ "172.17.0.1", "6379" };
co_await (db1.async_run(ep, {}, net::use_awaitable) || db2.async_run(ep, {}, net::use_awaitable) ||
db3.async_run(ep, {}, net::use_awaitable) || db4.async_run(ep, {}, net::use_awaitable) ||
timer.async_wait(net::use_awaitable));
std::cout << i << std::endl;
}
}
int
main()
{
net::io_context ioc;
net::co_spawn(ioc, reconnect(), net::detached);
ioc.run();
}
Thanks for insisting :) I learned many sutilities in the Asio cancellation mechanism. This commit should fix the problem https://github.com/mzimbres/aedis/commit/bac27c1770a14c47cd802eb556983d14e93df013. Let me know if it works for you.
Seems that fixed the problem, thanks :partying_face: