graphql-platform icon indicating copy to clipboard operation
graphql-platform copied to clipboard

Fusion Gateway SSE subscriptions don't propagate cancellation to subgraphs

Open Kriskit opened this issue 1 month ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Product

Hot Chocolate Fusion

Describe the bug

When a client disconnects from an SSE subscription via the Fusion Gateway, the cancellation token is not propagated to the subgraph. The subgraph subscription continues running indefinitely, causing resource leaks.

Direct connection to subgraph: Cancellation propagates correctly ✅ Via Fusion Gateway: Cancellation does NOT propagate ❌

Steps to reproduce

Minimal reproduction: https://github.com/Kriskit/hotchocolate-sse-disconnect-bug

# Terminal 1 - Start subgraph
cd Subgraph && dotnet run --urls "http://localhost:5001"

# Terminal 2 - Start gateway  
cd Gateway && dotnet run --urls "http://localhost:5000"

Test 1: Direct to Subgraph (WORKS) Open http://localhost:5001/graphql and run:

subscription { onMessage { content timestamp } }

Stop the subscription - subgraph logs show disconnect.

Test 2: Via Gateway (BUG) Open http://localhost:5000/graphql and run:

subscription { onMessage { content timestamp } }

Stop the subscription - subgraph never logs disconnect. Subscription runs forever.

Proof from real testing (2025-12-05):

[14:40:40] STREAM: Subscribe - Client connected    <- Via Gateway
[14:41:45] STREAM: Subscribe - Client connected    <- Via Gateway (second test)
[14:41:57] STREAM: Subscribe - Client connected    <- Direct to subgraph
[14:42:05] STREAM: CancellationToken triggered!    <- ONLY direct connection cancelled!
[14:42:05] STREAM: Unsubscribe - Client disconnected

After testing, the subgraph had 56 zombie connections from Gateway subscriptions that were never cancelled.

Root Cause

In DefaultHttpGraphQLSubscriptionClient.cs line 46, the await foreach loop is missing .WithCancellation(cancellationToken):

Current code (15.1.11):

await foreach (var result in response.ReadAsResultStreamAsync(cancellationToken).ConfigureAwait(false))

Should be:

await foreach (var result in response.ReadAsResultStreamAsync(cancellationToken).WithCancellation(cancellationToken).ConfigureAwait(false))

The [EnumeratorCancellation] attribute only works when the caller uses .WithCancellation(). Without it, the cancellation token passed to ReadAsResultStreamAsync may not propagate correctly during iteration.

Note: This is already fixed in Fusion-vnext (line 183 of SourceSchemaHttpClient.cs), but that's v16 which isn't released yet.

Relevant log output

[14:40:40] STREAM: Subscribe - Client connected    <- Via Gateway, NEVER disconnects
[14:41:45] STREAM: Subscribe - Client connected    <- Via Gateway, NEVER disconnects  
[14:41:57] STREAM: Subscribe - Client connected    <- Direct to subgraph
[14:42:05] STREAM: CancellationToken triggered!    <- Only direct connection cancelled
[14:42:05] STREAM: Unsubscribe - Client disconnected

Additional Context

Impact in production:

  • Subgraph subscription resolvers don't see the cancellation
  • Resources (database connections, event subscriptions) are not released
  • Memory leaks from orphaned subscriptions
  • Server resource exhaustion over time

Compare to WebSocketGraphQLSubscriptionClient.cs which correctly uses:

await foreach (var operationResult in socketResult.ReadResultsAsync()
    .WithCancellation(ct).ConfigureAwait(false))

Version

HotChocolate.Fusion 15.1.11

Kriskit avatar Dec 05 '25 14:12 Kriskit

Yes, we are aware of this and will issue a fix with 15.1.12

michaelstaib avatar Dec 07 '25 18:12 michaelstaib