python-bitcoinlib icon indicating copy to clipboard operation
python-bitcoinlib copied to clipboard

retry _call if connection times out

Open stefanlenoach opened this issue 8 years ago • 13 comments

stefanlenoach avatar Nov 15 '16 17:11 stefanlenoach

@petertodd Bitcoin closes its connection after about 30s and python-bitcoinlib doesn't notice. For users of python-bitcoinlib who need a long-running connection to Bitcoin, this patch automatically retries, so handles connection drops.

mcelrath avatar Nov 16 '16 15:11 mcelrath

Note that you're adding another dependency here, the retrying library.

Can this be rewritten to not have that dependency?

Also, are we 100% sure that this won't cause us to also retry even if the call actually succeeded? For instance, imagine a sendpayment RPC call is in fact successfully sent to bitcoind, but the connection drops before the reply gets back.

petertodd avatar Nov 19 '16 18:11 petertodd

Hey Peter,

Yes, the instance you mentioned seems like it would retry even if the call succeeded. Ideally we would like to keep the connection instantiated on line 136 of bitcoin/rpc.py open indefinitely. It looks like changing the timeout value doesn't actually have an effect; even if I set timeout=1000 it still seems to take 30 seconds for the connection to drop. Any suggestions? Thanks so much for the feedback and for writing and maintaining all of this!

Best, Stefan

On Sat, Nov 19, 2016 at 1:28 PM, Peter Todd [email protected] wrote:

Note that you're adding another dependency here, the retrying library.

Can this be rewritten to not have that dependency?

Also, are we 100% sure that this won't cause us to also retry even if the call actually succeeded? For instance, imagine a sendpayment RPC call is in fact successfully sent to bitcoind, but the connection drops before the reply gets back.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petertodd/python-bitcoinlib/pull/128#issuecomment-261730838, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_WP6c3eBgxoe3Oi8wcwxb_YOHcNqcRks5q_z_NgaJpZM4KyyRx .

stefanlenoach avatar Nov 21 '16 20:11 stefanlenoach

I would not recommend using retry or any kind of retrying behavior on bitcoind RPC requests. Connection timeout can happen for many reasons, even at the "end" of an RPC call. Retrying the same RPC call could cause buggy behavior such as "sending money twice". Bitcoin RPC protocol is not designed to prevent replays.

https://botbot.me/freenode/bitcoin-core-dev/2016-11-21/?msg=76819688&page=2

kanzure avatar Nov 21 '16 21:11 kanzure

Well the real bug is that bitcoind closes the connection and python-bitcoinlib doesn't notice. The next time the client tries to make an rpc call, it throws an exception. As far as I'm concerned this is a bug to pass responsibility for reopening connections to the caller.

@stefanlenoach what do you think about: Upon proxy.call, check if the connection is open. If not, open a new one and issue the call. This avoids potentially problematic retrying, but won't handle flaky connections -- if the connection is terminated in an unusual manner, it's still the caller's logic that decides whether to resubmit the transaction, or re-request data.

The case we're concerned about here is not an unusual close, it's the regular and well behaved close by bitcoind when it times out.

mcelrath avatar Nov 21 '16 21:11 mcelrath

@mcelrath I think your suggestion will work well enough for our application. Ideally we would be able to stop the connection from timing out, unfortunately it looks like this might not be possible.

On Mon, Nov 21, 2016 at 4:56 PM, Bob McElrath [email protected] wrote:

Well the real bug is that bitcoind closes the connection and python-bitcoinlib doesn't notice. The next time the client tries to make an rpc call, it throws an exception. As far as I'm concerned this is a bug to pass responsibility for reopening connections to the caller.

@stefanlenoach https://github.com/stefanlenoach what do you think about: Upon proxy.call, check if the connection is open. If not, open a new one and issue the call. This avoids potentially problematic retrying, but won't handle flaky connections -- if the connection is terminated in an unusual manner, it's still the caller's logic that decides whether to resubmit the transaction, or re-request data.

The case we're concerned about here is not an unusual close, it's the regular and well behaved close by bitcoind when it times out.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petertodd/python-bitcoinlib/pull/128#issuecomment-262080241, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_WP-OG3PXo0qb33VN-H0p8mj9tr-MGks5rAhOkgaJpZM4KyyRx .

stefanlenoach avatar Nov 21 '16 22:11 stefanlenoach

Check here, it looks like this was fixed in bitcoin core:

https://github.com/bitcoin/bitcoin/blob/master/qa/rpc-tests/test_framework/authproxy.py#L134

mcelrath avatar Nov 21 '16 22:11 mcelrath

We're still having a heck of a time with keeping the RPC connection to bitcoind active.

Here is yet another solution that uses epoll to detect the connection closing, and works quite well without retrying, and efficiently re-uses connections to bitcoind.

The behavior I observe is that when bitcoind closes the connection, I get a EPOLLIN event, indicating data is available. I then recv that data, observing there is no data, which indicates the connection is closed. I then close python-bitcoinlib's side of the connection, which causes the subsequent call to self.__conn.request to open a new connection.

mcelrath avatar Jan 23 '17 20:01 mcelrath

@petertodd We have a more complete solution to this here, using epoll, if the system supports it.

This is effective and doesn't incur the problem of possibly sending transactions twice.

mcelrath avatar Feb 02 '17 16:02 mcelrath

Having hit this problem AGAIN because epoll is not supported on macosx, it seems clear to me that the right answer is to rewrite this patch to use aiohttp, and catch asyncio.CancelledError and aiohttp.ClientDisconnectedError which generalize linux's platform-specific epoll.

mcelrath avatar May 30 '17 21:05 mcelrath

we just substituted connection object in BaseProxy with our own wrapper class, that handles reconnections itself.

https://github.com/petertodd/python-bitcoinlib/pull/188

retrying bitcoind RPC calls is not an issue for us, because we use it only to get transaction data and send raw transactions. Of course using auto-reconnecting connection with sendtoaddress() and the like is a bad idea.

dgpv avatar Sep 17 '18 13:09 dgpv

I have forward-ported this patch to python-bitcoinlib 0.11:

https://github.com/mcelrath/python-bitcoinlib/tree/epoll

I am unsure of the status of this patch and have not tested it yet. I've lost the development stack we were using when it was originally created and no longer have write access to the repository from which this PR is sourced. I think they've stopped using bitcoin entirely and the PR is stale. @stefanlenoach if you're still working there can you comment on the disposition of this PR? I'll close and reopen this PR if people agree EPOLL is a good direction.

So is EPOLL a good direction... I'm not a fan of closing and re-opening connections as @petertodd mentions above, this can lead to unexpected behavior, and requests getting dropped in between. It appears that the current code does a retry and reopen if the connection is dropped, which is the approach @petertodd disagreed with above (and I agree with him). You REALLY don't want to use sendtoaddress() across a connection drop and then retry.

So this needs further testing. I'm not 100% certain of the behavior of the current code WRT retries, but I don't like retries. I think if EPOLL works and is superior, the suggestion in this comment is probably a way to move this forward.

But for now, I forward ported it in case anyone else wants to test.

mcelrath avatar Mar 13 '20 23:03 mcelrath

Something like @mcelrath's epoll patch + @dgpv's "pass in a connection object" might be a sufficient solution. I would want to see tests before merging, though. In other words, maybe epoll work doesn't belong in BaseProxy itself.

kanzure avatar Aug 02 '20 21:08 kanzure