bee icon indicating copy to clipboard operation
bee copied to clipboard

fix(retrieval, pushsync): missing full close on streams

Open janos opened this issue 1 month ago • 0 comments

Checklist

  • [ ] I have read the coding guide.
  • [ ] My change requires a documentation update, and I have done it.
  • [ ] I have added tests to cover my changes.
  • [ ] I have filled out the description and linked the related issues.

Description

It is observed that strange stream resets were happening on testnet clusters:

"time"="2025-11-25 12:02:15.157880" "level"="debug" "logger"="node/retrieval" "msg"="failed to get chunk" "chunk_address"="11a5d1877aabf2136ec1e8caff88c58be3a2a62ccebaea335cd92581bec66435" "peer_address"="34f289a8f9ac96d725dd3753dce1bbe828dd14c90a5638277c0e444a58064918" "peer_proximity"=2 "error"="read delivery: stream reset (remote): code: 0x0: transport error: stream reset by remote, error code: 0 peer 34f289a8f9ac96d725dd3753dce1bbe828dd14c90a5638277c0e444a58064918"

This PR solved the issue that cause such errors. It is required to proprely close the channel in case that the final message with the internal error is sent.

Regression tests are made that use the real libp2p server in order to reproduce this problem. They may be a bit too much integrational, but they are needed for validating that the issue and the fix. If the FullClose is removed, the tests are failing with the same log messages as on real infrastructure.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

janos avatar Dec 01 '25 16:12 janos