Dangling wantlists
We have a small cluster with approximately 30 machines, we notice bitswap stat reports non-zero wantlist length even though there is no task running. Something like this
Bitswap Stat
WantlistLength: 31
Peers:
BlocksReceived: 8178
DataReceived: 2088954888
DupBlksReceived: 40
DupDataReceived: 28360
MessagesReceived: 24028
BlocksSent: 0
DataSent: 0
After running ipfs log to investigate, seems like there's a race condition between CANCEL_WANT_BLOCK & WANT_HAVE after receiving valid block from any peer. Here's a relevant log for cid bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke
{"level":"debug","ts":1761898062.2018778,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2019646,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2021472,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:735","msg":"sent message","type":"WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2020547,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.202884,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWJksuwoouCGnk21on4ffc13KrQSF92J1ruo37bCmxcS7c","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2080243,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCStHbtFFkWsZ6McYoeWJbeacT9thscVG2JAQe49sEqe4","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2114546,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118468,"logger":"bitswap/client","caller":"client/client.go:586","msg":"[recv] block; cid=bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke, peer=12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2118742,"logger":"bitswap/session","caller":"session/session.go:234","msg":"Bitswap <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118876,"logger":"bitswap/server/decision","caller":"decision/engine.go:936","msg":"Bitswap engine <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","size":262144}
{"level":"debug","ts":1761898062.2137315,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWE1SnSRTx2aQxp6Fmz3xAnEHBX7T3Bu9k3eem3xKyvkqT"}
{"level":"debug","ts":1761898062.2141423,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWEQNG2vZBxhJjtQt9eAmYqL2UhiRU5iCpU2W5CPvEm4ut"}
{"level":"debug","ts":1761898062.214188,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWFUrnAeYWHMtrYBPUq666dpgWa9KGXwYKwZmkqVMWvcSH"}
{"level":"debug","ts":1761898062.2145784,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJWeM6eA9hUtgw75aBmZKokRU6oinH5o2JPXZSkrAQuHi"}
{"level":"debug","ts":1761898062.2148564,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWP1af4GXUjTvFNGfXh8e1vyCLC4gJkoznkGGmmTeyxT5L"}
{"level":"debug","ts":1761898062.2149634,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWLtCDdQDMSqp5ggXd7DDdgCyV5Cuy6nQwSqn4S5k2ge6q"}
{"level":"debug","ts":1761898062.214888,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWRykdrWrTa5LayXuA4khk6EFbxPQGZyH7thh1XFoKnpJ1"}
{"level":"debug","ts":1761898062.215218,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWBGQWvUHrdh5BwAqbDE214toNf85mVhaPomyvgyLYeZPd"}
{"level":"debug","ts":1761898062.2172887,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWDtgqQnGyn8tK9nqUYVSm8EhV4AY1DUXpS6V9W3xnW4W4"}
{"level":"debug","ts":1761898062.217481,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJLKnTVEKYEEX6vksGwVMbcxZhAYwcNLSkVgXWJghxxCV"}
{"level":"debug","ts":1761898062.2184074,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWACHwpXfeUyrzUbFoHoma4TrkuiobYnRXCVeLshxaaZzq"}
{"level":"debug","ts":1761898062.2201893,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWN1XZcjdHPh3C7uWFa7cquB1KYMr8xZGa9E1z5X9bSPAP"}
{"level":"debug","ts":1761898062.220416,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWK7onE4x7b1pnfYfQFWeGPG141Zc45H43ZbsNpJdy5FBT"}
{"level":"debug","ts":1761898062.2210016,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3"}
{"level":"debug","ts":1761898062.2212029,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9"}
{"level":"debug","ts":1761898062.2212284,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3"}
{"level":"debug","ts":1761898062.2213898,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:728","msg":"sent message","type":"WANT_HAVE","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd"}
Notice how bitswap still send WANT_HAVE to one of the peers even though it already has a valid response.
From what I understand, there might be two goroutines running receive message loop and send loop, a race condition might happen in the following scenario:
- sendloop is looping through all of the session's peers to send
WANT_HAVE - receiveloop received a valid block, it loop through all session's peers, check if peer has already sent
WANT_HAVEto sendCANCEL_WANT_BLOCK, let's say an arbitrary peer P is in a state of hasn't sendWANT_HAVE, so it'll be ignored. - sendloop is currently looping peer P, it notice that it hasn't send
WANT_HAVEto this peer so it'll send a request and record it in peermanager - receiveloop exit, session will cleanup all resources & existing wantlist, when peer manager receive
HAVEresponse from peer P, it cannot find interested session to process this event, thus this cid's wantlist will never be cleanup
I can help with code if anybody is kind enough to provide insight into the loop mechanism in ipfs.
@LeeTeng2001 Thank you for your investigation and calling out this issue. We do not think this is a critical issue that must be solved immediately, so we look into how best to address it after kubo v0.39.0 release.
@LeeTeng2001 Thank you for your investigation and calling out this issue. We do not think this is a critical issue that must be solved immediately, so we look into how best to address it after kubo v0.39.0 release.
ok thanks for the reply, we haven't observed any undesirable behaviour it's only an incorrect behaviour so I think it's a low priority issue