boost
boost copied to clipboard
Stream reset in deal handling after 30 seconds
Checklist
- [X] This is not a question or a support request. If you have any boost related questions, please ask in the discussion forum.
- [X] This is not a new feature request. If it is, please file a feature request instead.
- [X] This is not an enhancement request. If it is, please file a improvement suggestion instead.
- [X] I have searched on the issue tracker and the discussion forum, and there is no existing related issue or discussion.
- [X] I am running the
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these. - [ ] I did not make any code changes to boost.
Boost component
- [X] boost daemon - storage providers
- [ ] boost client
- [ ] boost UI
- [ ] boost data-transfer
- [ ] boost index-provider
- [ ] Other
Boost Version
boostd v2.1.1
Describe the Bug
When deals come in and they get handled, this can take > 30 seconds (see https://github.com/filecoin-project/boost/issues/1747)
When that happens, the underlaying stream connection gets killed by something. Not the deal handler itself.
Logging Information
no
Repo Steps
- Add a sleep of > 30 seconds in your deal filter handling.
- Make a deal
- See that you get a stream reset on the client
- See that once your provider succeeds, it shows an err: stream reset on closing it too (here)
Note that upgrading the timeouts here does not help. (I've changed them to 60, and it still happens)
Also pinging @ribasushi as Spade is very much affected by this bug when clients have those slow databases as described here (or if they have a slow deal filter, or something else)
I have been on this issue for a while. I don’t think it is Boost related. The stream vanishes because the client who is proposing the deal is waiting longer then 30 seconds. In this case it is the full node Boost is using to propose the deal running Lotus or Venus who is dropping off
I must say that I think that it a good idea to have some flexibility in the timeout setting, however when Boost “behaves” right we are talking milliseconds. Moving to 300 seconds or higher does have some risks with it.
@RobQuistNL I don't think this is boostd
. I would recommend checking your client.
2023-12-22T08:35:15.028Z DEBUG boost-net lp2pimpl/net.go:256 processed deal proposal accept {"reqlog-uuid": "1bad5c14-754e-4780-8093-3dcfe554c438", "client-peer": "12D3KooWLjV7gpsJY2wGHHWeJQHxWwdnwq2zRQCPTsgVNQfMSjXB", "id": "83839011-d81d-49ad-a27c-9a91538e7ab3"}
2023-12-22T08:35:15.028Z INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "83839011-d81d-49ad-a27c-9a91538e7ab3", "accepted": true, "msg": "", "peer id": "12D3KooWLjV7gpsJY2wGHHWeJQHxWwdnwq2zRQCPTsgVNQfMSjXB", "client address": "t3shsquipzctlsbk6wp37aj6rtsefyi5a4k5tx2cy3h4fvzq4epgvy3t543wysde7nxr26anzywk77ks77v7eq", "provider address": "t01000", "piece cid": "baga6ea4seaqafukelbiytuppsptonwmgqyjknsvhx3mguazrxqz46d2k5og4qnq", "piece size": 8388608, "verified": false, "label": "bafykbzacecbu7tfbcouw3lumuol3lhanatlboe3rvcbof7yjh6figf3d3t342", "start epoch": "5861", "end epoch": "524261", "price per epoch": "156250000", "duration": "40.060123643s"}
Did your client receive that response? Also, what version are you running? Because I (we) think it might be a deeper level timeout set somewhere to 30s. The responses that take over 30 seconds never make it back to the requester. I also see "err: stream reset" in the debug logs, BUT I do see that send deal proposal response
info log.
Please note that I have seen this happen for 2 out of 2 clients so far; Spade and Julien (TwinQuasar);
{"level":"info","ts":"2023-12-20T06:44:22.650Z","logger":"boost-prop","caller":"lp2pimpl/net.go:262","msg":"send deal proposal response","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","accepted":true,"msg":"","peer id":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","client address":"f1zo3n62q6j5b3rfc4drj67d5e7hr5pqtygxy6s6i","provider address":"xxxx","piece cid":"baga6ea4seaqxxxx","piece size":34359738368,"verified":true,"label":"bafkreixxxx","start epoch":"xxx","end epoch":"xxx","price per epoch":"0","duration":"41.290329433s"}
{"level":"warn","ts":"2023-12-20T06:44:24.430Z","logger":"boost-net","caller":"lp2pimpl/net.go:287","msg":"writing deal response","reqlog-uuid":"330fad56-07a9-425e-b320-d7057f21ebab","client-peer":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","err":"stream reset"}
{"level":"info","ts":"2023-12-20T06:44:24.430Z","logger":"boost-net","caller":"lp2pimpl/net.go:231","msg":"closing stream","reqlog-uuid":"330fad56-07a9-425e-b320-d7057f21ebab","client-peer":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","err":"stream reset"}
{"level":"debug","ts":"2023-12-20T06:44:24.430Z","logger":"boost-net","caller":"lp2pimpl/net.go:233","msg":"handled deal proposal request","reqlog-uuid":"330fad56-07a9-425e-b320-d7057f21ebab","client-peer":"12D3KooWSMTGAgmByESrZWa7X7fdnrENXSQajmeGbfcioPSyyLMu","id":"db31d7ea-e07f-47d8-ac33-a2f2c694432c","duration":"43.070655464s"}
The code in net.go
clearly tries to use other timeouts, so if thats not working, there might be a bug in libp2p or some adapter / service module
In this case Riba also tested extensively and is 100% sure he does not disconnect the setup connection. I don't see it happening in the code on the surface on my end, so there's likely something deeper. Can you try again with 2 minutes or something? just to test?