damus icon indicating copy to clipboard operation
damus copied to clipboard

Implement negentropy optimization

Open linear[bot] opened this issue 1 month ago • 2 comments

linear[bot] avatar Oct 20 '25 17:10 linear[bot]

IOS-2122

linear[bot] avatar Oct 20 '25 17:10 linear[bot]

here are local relay model review + negentropy planning findings @danieldaquino


Findings

  - damus/Core/Networking/NostrNetworkManager/SubscriptionManager.swift:173 –
    In advancedStream, .ndbFirst(optimizeNetworkFilter: false) still invokes
    startNetworkStreamTask() immediately. That means the “NDB-first” mode
    used when the experimental local relay flag is on will start live relay
    subscriptions before NDB finishes, defeating the main point of that
    mode and re‑introducing the parallel behaviour you were trying to move
    away from. Gate the eager start on the actual enum case instead of only
    optimizeNetworkFilter, so .ndbFirst always waits for the NDB EOSE before
    touching the network.
  - damus/Core/Networking/NostrNetworkManager/SubscriptionManager.swift:146
    and :186 – latestNoteTimestampSeen is read on the network task and written
    on the NDB task without any actor/locking protection. Those two Task { … }
    bodies run concurrently, so we now have a data race on an optional UInt32.
    Depending on where the tasks land, the since optimisation can observe
    torn or stale values, leading to missed or duplicated events. Move that
    state behind an actor or other synchronization primitive (e.g. wrap in
    ManagedAtomic or push the optimisation into an actor-isolated helper) so the
    tasks never touch the raw var concurrently.
  - damus/Core/Networking/NostrNetworkManager/SubscriptionManager.swift:142
    and :269 – networkStreamTask and the outer multiSessionStreamingTask aren’t
    registered with TaskManager. When handleAppBackgroundRequest triggers
    cancelAllTasks(), only the inner session tasks stop; the outer loops wake
    back up a second later and recreate fresh sessions, so backgrounding won’t
    actually quiesce subscriptions (and battery/network use continues). Either
    add those outer tasks to the task manager or flip the loops so cancellation
    of the inner task propagates up instead of auto-resubscribing.

  Open Questions

  - Is the intent for .ndbFirst consumers to ever see network events directly,
    or should everything re-flow through NDB? Clarifying that will help decide
    whether latestNoteTimestampSeen should also advance on network events and
    how aggressive the network retry loop should be.

  NIP-77 / Issue 125 Notes

  - The draft spec defines a NEG-* message family that wraps the existing binary
    negentropy protocol in hex over the websocket, with clear sequencing (NEG-
    OPEN → alternating NEG-MSG → optional NEG-CLOSE) and error semantics (NEG-
    ERR using NIP-01 style codes). Relays need to keep per-subscription state,
    honour optional cache use, and ensure the “fingerprints to ID deltas” loop
    is correct.
  - Issue #125 highlights two key concerns worth baking into our design: (1)
    backwards compatibility/fallback to classic REQ/EVENT flows when either side
    lacks NIP-77 support, and (2) the complexity cost of managing the per-filter
    reconciliation state. It also notes existing relay/client implementations we
    can reference for expected behaviour.

  Negentropy Integration Plan

  1. Define the protocol surface (small commit): add swift types for NEG
     messages (enums for NEG-OPEN/NEG-MSG/NEG-CLOSE/NEG-ERR, filter + hex
     payload encoding helpers, reason-code constants) and unit tests that round-
     trip encode/decode the payloads.
  2. Embed a negentropy session manager in RelayConnection/RelayPool (small
     commit): detect peer support (feature flag or supported_nips), maintain
     per-subscription reconciliation state, and surface a high-level session API
     that maps to the new message types while falling back to legacy REQ when
     negotiation fails.
  3. Teach SubscriptionManager to choose negentropy vs classic flow (focused
     commit): extend stream setup to request a negentropy session when filters/
     relays qualify, hook session progress callbacks into the existing NDB-
     first pipeline, and ensure the caller still receives the same NdbNoteLender
     yields.
  4. Handle ID deltas and event fetch/publish (small commit): once a session
     reports missing IDs, trigger targeted EVENT uploads or REQ downloads, and
     make sure local NDB ingests still dedupe correctly; add regression tests
     that reconcile a known dataset.
  5. Instrument and guard the path (small commit): add metrics/logs to observe
     handshake success, add a feature flag in settings, and update background
     cancellation so negentropy sessions are torn down cleanly alongside the
     rest of the relay tasks.
  6. Documentation & QA (final commit): document the feature flag and fallback
     behaviour in docs/ and add integration tests (or manual test plan) covering
     both negentropy-enabled and fallback scenarios.

  Natural next steps after addressing the findings: fix the ndbFirst
  gating, wrap latestNoteTimestampSeen behind safe isolation, and make sure
  cancelAllTasks actually quiesces the long-lived tasks; then rerun the existing

alltheseas avatar Oct 22 '25 17:10 alltheseas