freenet-core icon indicating copy to clipboard operation
freenet-core copied to clipboard

Observing high send packet rate

Open kustomzone opened this issue 1 month ago • 6 comments

Release v0.1.37 is now connecting to gateway and peers in both Windows and Ubuntu/WSL. Seeing a rapid stream of sent packets (as much as 4-5Mbps) in console output.

May or may not be, connected to both versions being able to write to db, but nothing is being added. My last test version correctly writing to db on Windows was beginning of August.


Image

(shutdown)

Image


Freenet-windows-log.zip

(note: WSL/Ubuntu)

Freenet-ubuntu-log.zip


kustomzone avatar Nov 16 '25 02:11 kustomzone

🤖 Auto-labeled

Applied labels:

  • T-bug (95% confidence)
  • P-medium (85% confidence)
  • A-networking (90% confidence)
  • S-needs-reproduction (80% confidence)

Reasoning: The report describes unexpected high outgoing packet/send rate (4-5 Mbps) when connecting to gateway/peers on Windows and WSL/Ubuntu, which indicates incorrect behavior in networking or peer interaction — this is a bug. It affects multiple platforms but does not appear to block release, so P-medium is appropriate. Diagnosing and fixing likely requires moderate investigation into networking code/pathways and logs, so E-medium. The issue is clearly networking-related (peer/gateway traffic), so A-networking applies. The reporter provided logs but clear reproduction steps and confirmation are still needed, so mark S-needs-reproduction.

Previous labels: none

If these labels are incorrect, please update them. This helps improve the auto-labeling system.

github-actions[bot] avatar Nov 16 '25 02:11 github-actions[bot]

Ubuntu debug log

Freenet-ubuntu-debug-log.zip

kustomzone avatar Nov 16 '25 16:11 kustomzone

Investigation update: connection reservation underflow

What we tried

  • Added underflow guards in connection_manager::prune_in_transit_connection and unit tests to ensure reserved_connections never goes negative.
  • Added outbound reservation tracking (reserve/release) in connect flow to keep reservation balance aligned with handshake outcomes.
  • Ran targeted test test_put_with_subscribe_flag repeatedly to validate fixes.

Results

  • New guards prevent underflow locally, but test_put_with_subscribe_flag still times out waiting for the initial PUT response after ~30s.
  • Event logs show PutSuccess emitted around 25–30s, yet the client never receives the corresponding PutResponse on the websocket. Failures present as "Client 1: Did not receive PUT response within 30 seconds." Baseline main passes.
  • Repeated failures even after adding outbound reservation accounting; outbound connection failures with "connection already exists" still occur, followed by missing client response.

Takeaways

  • Underflow protection alone doesn’t fix the test regression; the missing PUT response suggests a deeper issue in the PUT response delivery path (session actor/web API) when connection reservation logic changes.
  • The reservation counter workarounds risk masking the real root cause; the fact that PutSuccess is logged but the websocket response is absent points to response routing/forwarding rather than counter math.

Next steps I recommend

  1. Instrument the session actor / websocket path to trace ContractResponse::PutResponse generation and delivery for this test, comparing with passing baseline.
  2. Audit connection reuse: repeated "connection already exists" errors might leave callbacks unregistered, so ensure pending callback/state cleanup matches connection lifecycle.
  3. Consider reverting to baseline reservation semantics and localizing underflow protection to decrement sites only, to confirm whether the new reservation bookkeeping is interfering with callback delivery.

Hope this saves the next pass from repeating these attempts.

sanity avatar Nov 17 '25 16:11 sanity

Update

  • Root cause: repeated overlapping gateway dials drove reserved connections 0→10 and spawned duplicate handshakes (seen as ~11 ConnectRequests/10s) leading to high packet rate and dropped PutResponse.
  • Fix: added per-gateway pending dial guard and duplicate-reservation short-circuit in connection manager; bootstrap loop now skips gateways already pending/reserved and waits for in-flight handshakes. Also routed decoupled processing through report_result and decoupled PUT auto-subscribe child ops from parent tracking so PutResponse isn’t blocked.
  • Tests: - cargo test -p freenet --test operations test_put_with_subscribe_flag --quiet (pass)

Let me know if you want this branched/PR’d now.

sanity avatar Nov 17 '25 20:11 sanity

Root cause fixed: overlapping gateway dials created multiple reserved handshakes and duplicate ConnectRequests, causing high packet rate and a stuck PutResponse. Fixes: duplicate-peer/reservation guards, in-flight gateway tracking in join loop, decoupled processing routed via report_result + SessionActor, and PUT auto-subscribe no longer blocks parent (GET still tracks). Test: cargo test -p freenet --test operations test_put_with_subscribe_flag --quiet (pass).

sanity avatar Nov 17 '25 20:11 sanity

Root cause fixed: overlapping gateway dials created multiple reserved handshakes and duplicate ConnectRequests, causing high packet rate and a stuck PutResponse. Fixes: duplicate-peer/reservation guards, in-flight gateway tracking in join loop, decoupled processing routed via report_result + SessionActor, and PUT auto-subscribe no longer blocks parent (GET still tracks). Test: cargo test -p freenet --test operations test_put_with_subscribe_flag --quiet (pass).

Sounds like another regression from the refactor of the hanshake process. :(

iduartgomez avatar Nov 17 '25 22:11 iduartgomez