boxo icon indicating copy to clipboard operation
boxo copied to clipboard

Dangling wantlists

Open LeeTeng2001 opened this issue 3 months ago • 2 comments

We have a small cluster with approximately 30 machines, we notice bitswap stat reports non-zero wantlist length even though there is no task running. Something like this

Bitswap Stat
  WantlistLength: 31
  Peers:
  BlocksReceived: 8178
  DataReceived: 2088954888
  DupBlksReceived: 40
  DupDataReceived: 28360
  MessagesReceived: 24028
  BlocksSent: 0
  DataSent: 0

After running ipfs log to investigate, seems like there's a race condition between CANCEL_WANT_BLOCK & WANT_HAVE after receiving valid block from any peer. Here's a relevant log for cid bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke

{"level":"debug","ts":1761898062.2018778,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2019646,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2021472,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:735","msg":"sent message","type":"WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2020547,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.202884,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWJksuwoouCGnk21on4ffc13KrQSF92J1ruo37bCmxcS7c","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2080243,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCStHbtFFkWsZ6McYoeWJbeacT9thscVG2JAQe49sEqe4","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2114546,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118468,"logger":"bitswap/client","caller":"client/client.go:586","msg":"[recv] block; cid=bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke, peer=12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2118742,"logger":"bitswap/session","caller":"session/session.go:234","msg":"Bitswap <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118876,"logger":"bitswap/server/decision","caller":"decision/engine.go:936","msg":"Bitswap engine <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","size":262144}
{"level":"debug","ts":1761898062.2137315,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWE1SnSRTx2aQxp6Fmz3xAnEHBX7T3Bu9k3eem3xKyvkqT"}
{"level":"debug","ts":1761898062.2141423,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWEQNG2vZBxhJjtQt9eAmYqL2UhiRU5iCpU2W5CPvEm4ut"}
{"level":"debug","ts":1761898062.214188,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWFUrnAeYWHMtrYBPUq666dpgWa9KGXwYKwZmkqVMWvcSH"}
{"level":"debug","ts":1761898062.2145784,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJWeM6eA9hUtgw75aBmZKokRU6oinH5o2JPXZSkrAQuHi"}
{"level":"debug","ts":1761898062.2148564,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWP1af4GXUjTvFNGfXh8e1vyCLC4gJkoznkGGmmTeyxT5L"}
{"level":"debug","ts":1761898062.2149634,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWLtCDdQDMSqp5ggXd7DDdgCyV5Cuy6nQwSqn4S5k2ge6q"}
{"level":"debug","ts":1761898062.214888,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWRykdrWrTa5LayXuA4khk6EFbxPQGZyH7thh1XFoKnpJ1"}
{"level":"debug","ts":1761898062.215218,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWBGQWvUHrdh5BwAqbDE214toNf85mVhaPomyvgyLYeZPd"}
{"level":"debug","ts":1761898062.2172887,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWDtgqQnGyn8tK9nqUYVSm8EhV4AY1DUXpS6V9W3xnW4W4"}
{"level":"debug","ts":1761898062.217481,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJLKnTVEKYEEX6vksGwVMbcxZhAYwcNLSkVgXWJghxxCV"}
{"level":"debug","ts":1761898062.2184074,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWACHwpXfeUyrzUbFoHoma4TrkuiobYnRXCVeLshxaaZzq"}
{"level":"debug","ts":1761898062.2201893,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWN1XZcjdHPh3C7uWFa7cquB1KYMr8xZGa9E1z5X9bSPAP"}
{"level":"debug","ts":1761898062.220416,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWK7onE4x7b1pnfYfQFWeGPG141Zc45H43ZbsNpJdy5FBT"}
{"level":"debug","ts":1761898062.2210016,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3"}
{"level":"debug","ts":1761898062.2212029,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9"}
{"level":"debug","ts":1761898062.2212284,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3"}
{"level":"debug","ts":1761898062.2213898,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:728","msg":"sent message","type":"WANT_HAVE","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd"}

Notice how bitswap still send WANT_HAVE to one of the peers even though it already has a valid response.

From what I understand, there might be two goroutines running receive message loop and send loop, a race condition might happen in the following scenario:

  1. sendloop is looping through all of the session's peers to send WANT_HAVE
  2. receiveloop received a valid block, it loop through all session's peers, check if peer has already sent WANT_HAVE to send CANCEL_WANT_BLOCK, let's say an arbitrary peer P is in a state of hasn't send WANT_HAVE, so it'll be ignored.
  3. sendloop is currently looping peer P, it notice that it hasn't send WANT_HAVE to this peer so it'll send a request and record it in peermanager
  4. receiveloop exit, session will cleanup all resources & existing wantlist, when peer manager receive HAVE response from peer P, it cannot find interested session to process this event, thus this cid's wantlist will never be cleanup

I can help with code if anybody is kind enough to provide insight into the loop mechanism in ipfs.

LeeTeng2001 avatar Oct 31 '25 10:10 LeeTeng2001

@LeeTeng2001 Thank you for your investigation and calling out this issue. We do not think this is a critical issue that must be solved immediately, so we look into how best to address it after kubo v0.39.0 release.

gammazero avatar Nov 08 '25 05:11 gammazero

@LeeTeng2001 Thank you for your investigation and calling out this issue. We do not think this is a critical issue that must be solved immediately, so we look into how best to address it after kubo v0.39.0 release.

ok thanks for the reply, we haven't observed any undesirable behaviour it's only an incorrect behaviour so I think it's a low priority issue

LeeTeng2001 avatar Nov 14 '25 06:11 LeeTeng2001