bitcoin
bitcoin copied to clipboard
test: adds outbound eviction functional tests, updates comment in ConsiderEviction
Motivation
While checking the outbound eviction code I realized a case was not considered within the comments, which in turn made me realize we had no functional tests for the outbound eviction case (when I went to check/add the test case).
This PR updates the aforementioned comment and adds functional tests to cover the outbound eviction logic, in addition to the existing unit tests found at src/test/denialofservice_tests.cpp
.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage
For detailed information about the code coverage, see the test coverage report.
Reviews
See the guideline for information on the review process.
Type | Reviewers |
---|---|
ACK | davidgumberg, cbergqvist, tdb3, achow101 |
Concept ACK | sipa, stratospher, marcofleon |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
No conflicts as of last run.
While the tests seem to run fine, I noticed a warning that appears every now and then:
TestFramework.p2p (WARNING): Connection lost to 0:0 due to [Errno 54] Connection reset by peer
I'm not sure if something unintended is happening under the hood. Disconnections are expected, but it's our node who is supposed to trigger them, so maybe I'm missing something.
Rebased to fix CI
Rebased. This has failed CI once when waiting for the disconnection after:
cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead.
PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.
I redefined how peers are split in p2p_eviction:test_outbound_eviction_protected_mixed
to address https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1485375517. Both commits can be squashed (leaving them as they are for now in case someone does not agree with the approach)
Rebased. This has failed CI once when waiting for the disconnection after:
cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead.
PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.
I've been looking more into this, given an error showed up again on CI. I think the issue arose from the initial mock time not being properly set. I've reset the increments to be just one second apart, plus set the initial mock time before the tests were run.
Also, I've moved the outbound eviction tests to their own file, given p2p_eviction
was being described as testing inbound eviction, plus used some custom classes that were only relevant for inbounds.
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Debug: https://github.com/bitcoin/bitcoin/runs/21495911628
Rebased. This has failed CI once when waiting for the disconnection after:
cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead. PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.
I've been looking more into this, given an error showed up again on CI. I think the issue arose from the initial mock time not being properly set. I've reset the increments to be just one second apart, plus set the initial mock time before the tests were run.
Also, I've moved the outbound eviction tests to their own file, given
p2p_eviction
was being described as testing inbound eviction, plus used some custom classes that were only relevant for inbounds.
(now non-stale) ACK for f06b50dfd45000f34bf258d48101718b4b5f9772.
Pulled. Built. Ran p2p_eviction.py
and p2p_outbound_eviction.py
. Both passed.
Couldn't reproduce the CI timeout issue btw. But I have encountered the warning:
2024-02-22T22:01:47.354000Z TestFramework.p2p (WARNING): Connection lost to 0:0 due to [Errno 104] Connection reset by peer
Thanks for the reviews @cbergqvist and @marcofleon, I've addressed the comments of both.
I also squashed the two commits from @tdb3 given looks like no one ha opposed to those
What do you think about testing that block-relay-only nodes get evicted and aren't eligible for protection?
For example:
# Ensure that block-relay-only peers are not granted protection and ARE evicted
self.log.info("Create an outbound connection to a peer that shares our tip but is block-relay only, so it does not get protection")
test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, connection_type="block-relay-only")
test_node.send_and_ping(msg_headers([from_hex(CBlockHeader(), node.getblockheader(node.getbestblockhash(), False))]))
self.log.info("Mine a new block and sync with our peer")
self.generateblock(node, output="raw(42)", transactions=[])
test_node.sync_with_ping()
self.log.info("Let enough time pass for the timeouts to go off")
# Trigger the timeouts and check how we are still connected
cur_mock_time += (CHAIN_SYNC_TIMEOUT + 1)
node.setmocktime(cur_mock_time)
test_node.sync_with_ping()
test_node.wait_for_getheaders()
cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
node.setmocktime(cur_mock_time)
self.log.info("Test that the block-relay-only peer gets evicted")
test_node.wait_for_disconnect()
Otherwise, reviewed ConsiderEviction
and the tests added here provide good coverage. I ran the p2p_outbound_eviction.py
tests and they passed, I was unable to reproduce the connection reset mentioned above.
ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/bffbb176d8a4f545c26ff7e437ab33cb004852bb
re ACK bffbb176d8a4f545c26ff7e437ab33cb004852bb
What do you think about testing that block-relay-only nodes get evicted and aren't eligible for protection?
This was already pointed out by @tdb3 in his review: https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1486369882
TL;DR; Protected peers imply outbound-full-relay
, but given this seems to be a recurring question I'll add it to the test. I guess it wouldn't hurt having it to make sure we don't change the logic without noticing it.
I've added a test to cover for @tdb3 and @davidgumberg comments on non-outbound-full-relay peers eviction, plus amended the original commit to rename test_outbound_eviction_mixed
and added a missing comment to the method.
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Debug: https://github.com/bitcoin/bitcoin/runs/22178847360
ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/33353b7723d0704d64c0dfc1b0f5d072692e68b9
Tested by modifying bool IsOutboundOrBlockRelayConn()
, and CHAIN_SYNC_TIMEOUT
and this test catches. It would be nice if we also had a test that made sure we don't punish peers and set the timeout to HEADERS_RESPONSE_TIME
before we've given them CHAIN_SYNC_TIMEOUT
to catch up, but I think this PR looks great as it stands!
Rebased to deal with merge conflicts
reACK https://github.com/bitcoin/bitcoin/commit/59affd0bf9042d746a6b32739edb0cd45d28d569
re ACK for 59affd0bf9042d746a6b32739edb0cd45d28d569. Built and re-ran functional tests locally. All passed.
Rebased to fix CI due to https://github.com/bitcoin/bitcoin/pull/29610
reACK https://github.com/bitcoin/bitcoin/pull/29122/commits/cd5b7a11ee55b1f86bb7662c4fc83616ffa735b1
$ git range-diff a945f09...59affd0 015ac13...cd5b7a1
1: a12a4dfa90 = 1: 8fa272dccc test: adds outbound eviction functional tests, updates comment in ConsiderEviction
2: 59affd0bf9 = 2: cd5b7a11ee test: adds outbound eviction tests for non outbound-full-relay peers
$ python ./test/functional/test_runner.py
ALL | ✓ Passed | 1350 s (accumulated)
Concept ACK
Rebased + addressed comments from @stratospher. I had to slightly reshape test_outbound_eviction_unprotected
to fit https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1535542317 given some time mocks + waits were unnecessary
reACK for 368389d973a85d21aa9ebf030ef5e7f7ac5343de. Pulled, built, ran all unit/functional tests (all passed).
Rebased on top of https://github.com/bitcoin/bitcoin/pull/29736. Now this is using wait_for_getheaders
to reduce the boilerplate of having to manually pop the results from last_message
.
cc/ @stratospher
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
Debug: https://github.com/bitcoin/bitcoin/runs/24266342152
ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/be2c5510521081119f62ef3b29f76bb9097d0f34
Re-reviewed code, and attempted to break tests by making ConsiderEviction
misbehave.
It is worth noting that these tests will not catch if outbound eviction logic gets triggered too early. e.g.:
/** How long to wait for a peer to respond to a getheaders request */
static constexpr auto HEADERS_RESPONSE_TIME{0s};
// [...]
/** Timeout for (unprotected) outbound peers to sync to our chainwork */
static constexpr auto CHAIN_SYNC_TIMEOUT{0s};
will pass. That seems fine to me, maybe something for a follow up.
ACK be2c551 Re-reviewed code, and attempted to break tests by making
ConsiderEviction
misbehave.It is worth noting that these tests will not catch if outbound eviction logic gets triggered too early. e.g.:
/** How long to wait for a peer to respond to a getheaders request */ static constexpr auto HEADERS_RESPONSE_TIME{0s}; // [...] /** Timeout for (unprotected) outbound peers to sync to our chainwork */ static constexpr auto CHAIN_SYNC_TIMEOUT{0s};
will pass. That seems fine to me, maybe something for a follow-up.
I don't think I follow you here. HEADERS_RESPONSE_TIME
and CHAIN_SYNC_TIMEOUT
are constants hardcoded in net_processing.cpp
(and mapped here). I would expect unit test to fail if those are updated.
Updated to address https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1580221364
reACK https://github.com/bitcoin/bitcoin/commit/d53d84834747c37f4060a9ef379e0a6b50155246
I don't think I follow you here.
HEADERS_RESPONSE_TIME
andCHAIN_SYNC_TIMEOUT
are constants hardcoded innet_processing.cpp
(and mapped here). I would expect unit test to fail if those are updated.
I didn't mean to say that these tests should be responsible for enforcing the particular values of HEADERS_RESPONSE_TIME
and CHAIN_SYNC_TIMEOUT
, just that they would not catch a regression where outbound eviction logic gets triggered earlier than expected on peers, which can be demonstrated by setting both values to zero.
Anyways, I think if what I'm saying isn't confused in some way, it is more pedantic than useful.