bitcoin icon indicating copy to clipboard operation
bitcoin copied to clipboard

test: adds outbound eviction functional tests, updates comment in ConsiderEviction

Open sr-gi opened this issue 1 year ago • 24 comments

Motivation

While checking the outbound eviction code I realized a case was not considered within the comments, which in turn made me realize we had no functional tests for the outbound eviction case (when I went to check/add the test case).

This PR updates the aforementioned comment and adds functional tests to cover the outbound eviction logic, in addition to the existing unit tests found at src/test/denialofservice_tests.cpp.

sr-gi avatar Dec 20 '23 16:12 sr-gi

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK davidgumberg, cbergqvist, tdb3, achow101
Concept ACK sipa, stratospher, marcofleon

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

DrahtBot avatar Dec 20 '23 16:12 DrahtBot

While the tests seem to run fine, I noticed a warning that appears every now and then:

TestFramework.p2p (WARNING): Connection lost to 0:0 due to [Errno 54] Connection reset by peer

I'm not sure if something unintended is happening under the hood. Disconnections are expected, but it's our node who is supposed to trigger them, so maybe I'm missing something.

sr-gi avatar Dec 20 '23 16:12 sr-gi

Rebased to fix CI

sr-gi avatar Jan 29 '24 22:01 sr-gi

Rebased. This has failed CI once when waiting for the disconnection after:

cur_mock_time += (HEADERS_RESPONSE_TIME + 1)

I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead.

PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.

sr-gi avatar Feb 06 '24 14:02 sr-gi

I redefined how peers are split in p2p_eviction:test_outbound_eviction_protected_mixed to address https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1485375517. Both commits can be squashed (leaving them as they are for now in case someone does not agree with the approach)

sr-gi avatar Feb 12 '24 16:02 sr-gi

Rebased. This has failed CI once when waiting for the disconnection after:

cur_mock_time += (HEADERS_RESPONSE_TIME + 1)

I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead.

PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.

I've been looking more into this, given an error showed up again on CI. I think the issue arose from the initial mock time not being properly set. I've reset the increments to be just one second apart, plus set the initial mock time before the tests were run.

Also, I've moved the outbound eviction tests to their own file, given p2p_eviction was being described as testing inbound eviction, plus used some custom classes that were only relevant for inbounds.

sr-gi avatar Feb 12 '24 21:02 sr-gi

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/21495911628

DrahtBot avatar Feb 12 '24 21:02 DrahtBot

Rebased. This has failed CI once when waiting for the disconnection after:

cur_mock_time += (HEADERS_RESPONSE_TIME + 1)

I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead. PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.

I've been looking more into this, given an error showed up again on CI. I think the issue arose from the initial mock time not being properly set. I've reset the increments to be just one second apart, plus set the initial mock time before the tests were run.

Also, I've moved the outbound eviction tests to their own file, given p2p_eviction was being described as testing inbound eviction, plus used some custom classes that were only relevant for inbounds.

(now non-stale) ACK for f06b50dfd45000f34bf258d48101718b4b5f9772. Pulled. Built. Ran p2p_eviction.py and p2p_outbound_eviction.py. Both passed.

tdb3 avatar Feb 13 '24 00:02 tdb3

Couldn't reproduce the CI timeout issue btw. But I have encountered the warning:

2024-02-22T22:01:47.354000Z TestFramework.p2p (WARNING): Connection lost to 0:0 due to [Errno 104] Connection reset by peer

cbergqvist avatar Feb 22 '24 22:02 cbergqvist

Thanks for the reviews @cbergqvist and @marcofleon, I've addressed the comments of both.

I also squashed the two commits from @tdb3 given looks like no one ha opposed to those

sr-gi avatar Feb 28 '24 21:02 sr-gi

What do you think about testing that block-relay-only nodes get evicted and aren't eligible for protection?

For example:

# Ensure that block-relay-only peers are not granted protection and ARE evicted
self.log.info("Create an outbound connection to a peer that shares our tip but is block-relay only, so it does not get protection")
test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, connection_type="block-relay-only")

test_node.send_and_ping(msg_headers([from_hex(CBlockHeader(), node.getblockheader(node.getbestblockhash(), False))]))

self.log.info("Mine a new block and sync with our peer")
self.generateblock(node, output="raw(42)", transactions=[])
test_node.sync_with_ping()

self.log.info("Let enough time pass for the timeouts to go off")
# Trigger the timeouts and check how we are still connected
cur_mock_time += (CHAIN_SYNC_TIMEOUT + 1)
node.setmocktime(cur_mock_time)
test_node.sync_with_ping()
test_node.wait_for_getheaders()
cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
node.setmocktime(cur_mock_time)
self.log.info("Test that the block-relay-only peer gets evicted")
test_node.wait_for_disconnect()

Otherwise, reviewed ConsiderEviction and the tests added here provide good coverage. I ran the p2p_outbound_eviction.py tests and they passed, I was unable to reproduce the connection reset mentioned above.

ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/bffbb176d8a4f545c26ff7e437ab33cb004852bb

davidgumberg avatar Mar 01 '24 07:03 davidgumberg

re ACK bffbb176d8a4f545c26ff7e437ab33cb004852bb

tdb3 avatar Mar 01 '24 12:03 tdb3

What do you think about testing that block-relay-only nodes get evicted and aren't eligible for protection?

This was already pointed out by @tdb3 in his review: https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1486369882

TL;DR; Protected peers imply outbound-full-relay, but given this seems to be a recurring question I'll add it to the test. I guess it wouldn't hurt having it to make sure we don't change the logic without noticing it.

sr-gi avatar Mar 01 '24 15:03 sr-gi

I've added a test to cover for @tdb3 and @davidgumberg comments on non-outbound-full-relay peers eviction, plus amended the original commit to rename test_outbound_eviction_mixed and added a missing comment to the method.

sr-gi avatar Mar 01 '24 16:03 sr-gi

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22178847360

DrahtBot avatar Mar 01 '24 17:03 DrahtBot

ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/33353b7723d0704d64c0dfc1b0f5d072692e68b9

Tested by modifying bool IsOutboundOrBlockRelayConn(), and CHAIN_SYNC_TIMEOUT and this test catches. It would be nice if we also had a test that made sure we don't punish peers and set the timeout to HEADERS_RESPONSE_TIME before we've given them CHAIN_SYNC_TIMEOUT to catch up, but I think this PR looks great as it stands!

davidgumberg avatar Mar 09 '24 01:03 davidgumberg

Rebased to deal with merge conflicts

sr-gi avatar Mar 11 '24 15:03 sr-gi

reACK https://github.com/bitcoin/bitcoin/commit/59affd0bf9042d746a6b32739edb0cd45d28d569

davidgumberg avatar Mar 12 '24 02:03 davidgumberg

re ACK for 59affd0bf9042d746a6b32739edb0cd45d28d569. Built and re-ran functional tests locally. All passed.

tdb3 avatar Mar 12 '24 22:03 tdb3

Rebased to fix CI due to https://github.com/bitcoin/bitcoin/pull/29610

sr-gi avatar Mar 15 '24 21:03 sr-gi

reACK https://github.com/bitcoin/bitcoin/pull/29122/commits/cd5b7a11ee55b1f86bb7662c4fc83616ffa735b1

$ git range-diff a945f09...59affd0 015ac13...cd5b7a1                                              
1:  a12a4dfa90 = 1:  8fa272dccc test: adds outbound eviction functional tests, updates comment in ConsiderEviction     
2:  59affd0bf9 = 2:  cd5b7a11ee test: adds outbound eviction tests for non outbound-full-relay peers 

$ python ./test/functional/test_runner.py
ALL                                                    | ✓ Passed  | 1350 s (accumulated) 

davidgumberg avatar Mar 16 '24 12:03 davidgumberg

Concept ACK

sipa avatar Mar 17 '24 16:03 sipa

Rebased + addressed comments from @stratospher. I had to slightly reshape test_outbound_eviction_unprotected to fit https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1535542317 given some time mocks + waits were unnecessary

sr-gi avatar Mar 25 '24 16:03 sr-gi

reACK for 368389d973a85d21aa9ebf030ef5e7f7ac5343de. Pulled, built, ran all unit/functional tests (all passed).

tdb3 avatar Mar 28 '24 15:03 tdb3

Rebased on top of https://github.com/bitcoin/bitcoin/pull/29736. Now this is using wait_for_getheaders to reduce the boilerplate of having to manually pop the results from last_message.

cc/ @stratospher

sr-gi avatar Apr 25 '24 18:04 sr-gi

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24266342152

DrahtBot avatar Apr 25 '24 18:04 DrahtBot

ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/be2c5510521081119f62ef3b29f76bb9097d0f34 Re-reviewed code, and attempted to break tests by making ConsiderEviction misbehave.

It is worth noting that these tests will not catch if outbound eviction logic gets triggered too early. e.g.:

/** How long to wait for a peer to respond to a getheaders request */
static constexpr auto HEADERS_RESPONSE_TIME{0s};
// [...]
/** Timeout for (unprotected) outbound peers to sync to our chainwork */
static constexpr auto CHAIN_SYNC_TIMEOUT{0s};

will pass. That seems fine to me, maybe something for a follow up.

davidgumberg avatar Apr 25 '24 23:04 davidgumberg

ACK be2c551 Re-reviewed code, and attempted to break tests by making ConsiderEviction misbehave.

It is worth noting that these tests will not catch if outbound eviction logic gets triggered too early. e.g.:

/** How long to wait for a peer to respond to a getheaders request */
static constexpr auto HEADERS_RESPONSE_TIME{0s};
// [...]
/** Timeout for (unprotected) outbound peers to sync to our chainwork */
static constexpr auto CHAIN_SYNC_TIMEOUT{0s};

will pass. That seems fine to me, maybe something for a follow-up.

I don't think I follow you here. HEADERS_RESPONSE_TIME and CHAIN_SYNC_TIMEOUT are constants hardcoded in net_processing.cpp (and mapped here). I would expect unit test to fail if those are updated.

sr-gi avatar Apr 26 '24 15:04 sr-gi

Updated to address https://github.com/bitcoin/bitcoin/pull/29122#discussion_r1580221364

sr-gi avatar Apr 26 '24 15:04 sr-gi

reACK https://github.com/bitcoin/bitcoin/commit/d53d84834747c37f4060a9ef379e0a6b50155246

I don't think I follow you here. HEADERS_RESPONSE_TIME and CHAIN_SYNC_TIMEOUT are constants hardcoded in net_processing.cpp (and mapped here). I would expect unit test to fail if those are updated.

I didn't mean to say that these tests should be responsible for enforcing the particular values of HEADERS_RESPONSE_TIME and CHAIN_SYNC_TIMEOUT, just that they would not catch a regression where outbound eviction logic gets triggered earlier than expected on peers, which can be demonstrated by setting both values to zero.

Anyways, I think if what I'm saying isn't confused in some way, it is more pedantic than useful.

davidgumberg avatar Apr 26 '24 23:04 davidgumberg