cpp-driver icon indicating copy to clipboard operation
cpp-driver copied to clipboard

Driver churns ports & connections if the cluster isn't answering

Open levkk opened this issue 4 years ago • 3 comments

If we are trying to create a connection pool to a cluster which is not answering, the connection pool will keep trying to connect until it exhausts all available outgoing ports. We artificially gave the driver only 1000 ports for the shard-aware connection pool.

This is what we got after a couple minutes:

1631141762.557 [WARN] (connection_pool.cpp:378:void datastax::internal::core::ConnectionPool::on_reconnect(datastax::internal::core::DelayedConnector*)): Connection pool was unable to reconnect to host 52.72.17.40 because of the following error: Connection timeout
terminate called after throwing an instance of 'std::runtime_error'
  what():  ShardPortCalculator: cannot find free outgoing port
Aborted (core dumped)

The driver needs to reuse outgoing ports or re-connect more gracefully.

levkk avatar Sep 08 '21 23:09 levkk

This is caused by ShardPortCalculator not having any way to mark ports as free. In order to fix this, it would be nice to have a way to reproduce this issue. Could you please give more specific steps? I tried giving driver small range of ports (10-100), and then blocking access to port 19042 of one of the nodes using iptables (tried with DROP and REJECT) - I tried doing this before starting the driver, and after it estabilished session and was doing work, but couldn't get this error :(

Lorak-mmk avatar Nov 04 '21 15:11 Lorak-mmk

@levkk

Lorak-mmk avatar Nov 08 '21 15:11 Lorak-mmk

I think DROP will make the driver hang until TCP keep-alive expires. Try rejecting the connection explicitly or better yet just shut down Scylla on the other end and run the test again.

levkk avatar Feb 13 '22 16:02 levkk