go-libp2p-pubsub
go-libp2p-pubsub copied to clipboard
Go IDONTWANT
Sending IDONTWANT
- [x] Implement a smart queue
- [x] Add priorities to the smart queue
- [x] Put IDONTWANT packets into the smart priority queue as soon as the node gets the packets
Handling IDONTWANT
- [x] Use a map to remember the message ids whose IDONTWANT packets have been received
- [x] Implement
max_idontwant_messages(ignore the IDONWANT packets if the max is reached) - [x] Clear the message IDs from the cache after 3 heartbeats
- [x] Hash the message IDs before putting them into the cache.
Spec: https://github.com/libp2p/specs/pull/548
However, I think using a slice to back the queue is problematic at many levels as it might leak space with a growing array, and also hang on to too much space.
The unused space will be freed eventually https://stackoverflow.com/a/26863706
I'm not really sure if we need dfd81e80848e5ce4e64fbf30cfdaafb4dc56b9d3 or not. I guess adding a non-standard flag to pb/trace.proto would break some users of the tracers?
I'm not really sure if we need dfd81e8 or not. I guess adding a non-standard flag to
pb/trace.protowould break some users of the tracers?
Yeah, let's not break compatibility with existing code, we don't need it.
Any progress other than that?
Any progress other than that?
I haven't implemented the second half. Does the first half looks good to you?
yes, looks reasonable so far.
@vyzo I finished everything already. Sorry for the delay. I was in vacation and was a bit busy lately. Thank you for your patience.
The message ids from IDONTWANT can be very large, so hashing it before keeping it in memory quite makes sense. The same thing is already implemented in nim-libp2p https://github.com/vacp2p/nim-libp2p/pull/1090
sure, but it needs to be cleared, otherwise its memroy leak and dos waiting to happen. Or at best the feature breaks itself for long running nodes.
I suggest for each IDW keep a counter of how many heartbeats it survived, and after 3 (to match the IHAVE ads) it should be cleared.
@vyzo It's all done. Sorry for the delay.
ok, can you rebase on master?
@vyzo wondering if this PR ready to be merged or are you waiting for anything to be fixed/addressed?
This seems to improve bandwidth usage and we want to take advantage of that.
I think in general it looks good, but i do want to do another pass.
In the meantime:
* can you bump the version to 1.2 * can you add a feature test to not send IDONTWANT if the other side doesnt support it?I will also invoke @Stebalien for review.
Haven't looked at the PR in detail, but will give it a try tomorrow.
seems like we have some flaky tests :(
@ppopth can you look at the go-check CI failure? Looks like the linter is complaining about some things.
@vyzo could you rerun the CI? thanks
done
@vyzo The CI error log is not informative. Do you have any idea?
...
--- PASS: TestTopicPublishWithKeyInvalidParameters (0.08s)
--- PASS: TestTopicPublishWithKeyInvalidParameters/nil_sign_private_key_should_error (0.00s)
--- PASS: TestTopicPublishWithKeyInvalidParameters/empty_peer_ID_should_error (0.00s)
=== RUN FuzzAppendOrMergeRPC
--- PASS: FuzzAppendOrMergeRPC (0.00s)
FAIL
FAIL github.com/libp2p/go-libp2p-pubsub 385.852s
-test.shuffle 1723651417916320350
=== RUN TestLastSeenCacheSlideForward
last_seen_cache_test.go:36: timing is too fine grained to run in CI
--- SKIP: TestLastSeenCacheSlideForward (0.00s)
=== RUN TestFirstSeenCacheNotFoundAfterExpire
--- PASS: TestFirstSeenCacheNotFoundAfterExpire (2.00s)
=== RUN TestFirstSeenCacheFound
--- PASS: TestFirstSeenCacheFound (0.00s)
=== RUN TestLastSeenCacheExpire
--- PASS: TestLastSeenCacheExpire (3.10s)
=== RUN TestFirstSeenCacheExpire
--- PASS: TestFirstSeenCacheExpire (3.00s)
=== RUN TestLastSeenCacheFound
--- PASS: TestLastSeenCacheFound (0.00s)
=== RUN TestLastSeenCacheNotFoundAfterExpire
--- PASS: TestLastSeenCacheNotFoundAfterExpire (2.00s)
PASS
ok github.com/libp2p/go-libp2p-pubsub/timecache 10.111s
FAIL
I think these are just flaky tests, i dont think your pr is breaking them.
Regarding the test failures:
- TestGossipsubConnTagMessageDeliveries this seems consistently flaky; let's add a t.Skip and open follow up issue to fix it.
- FuzzAppendOrMergeRPC @ppopth this might need to be looked upon; if it is a flake, we can skip as well and open follow up issue, but might be a problem (with the test perhaps).
@vyzo I didn't get any error with FuzzAppendOrMergeRPC, did you?
$ go test -v -run FuzzAppendOrMergeRPC
=== RUN FuzzAppendOrMergeRPC
--- PASS: FuzzAppendOrMergeRPC (0.00s)
PASS
ok github.com/libp2p/go-libp2p-pubsub 0.005s
thats a flake then. Ok, disable and open follow up issue!
done
CI failures - seems like a faulty commit.
@ppopth let's reenable the Fuzz test, it seems it is fine: see #574
@Stebalien @vyzo I think it's all done
running the CI, will merge once it greens.
should I squash the fix-up commits?
should I squash the fix-up commits?
no need, I will squash merge anyway