grpc-swift icon indicating copy to clipboard operation
grpc-swift copied to clipboard

I get error `unavailable (14): Transport became inactive` with a big number (>20) of unary requests

Open sergeiromanchuk opened this issue 9 months ago • 36 comments

Describe the bug

I have a table in which in each cell I send two queries to GRPC using async let. If I scroll through the table slowly, I almost never encounter any problems. But if I start scrolling the table quickly, every query I send throws the error unavailable (14): Transport became inactive.

This usually happens when the number of streams in use exceeds 20.

channel log [{Default}] connectionUtilizationChanged(id:streamsUsed:streamCapacity:)::ChannelProvider.swift::191:19: [API v2] → ChannelProvider::connectionUtilizationChanged(id:streamsUsed:streamCapacity:). Connection ID: ObjectIdentifier(0x000000011ee58900). Stream capacity: 2147483647. Streams used: 21

channel error [{Default}] connectionClosed(id:error:)::ChannelProvider.swift::205:19: [API v2] → ChannelProvider::connectionClosed(id:error:). Connection ID: ObjectIdentifier(0x000000011ee58900). Error: Optional(NIOCore.ChannelError.ioOnClosedChannel)

request error [{Default}] xxx()::XXXService.swift::67:23: [API] → XXXService::xxx(). XXX service completed with result: success(unavailable (14): Transport became inactive)

trace 2025-02-27T09:45:36+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.18 grpc.conn.addr_remote=10.229.0.160 grpc_request_id=5BFD6881-842F-42AB-95F1-AC0E322817FE [GRPC] failing buffered writes

SPM: Protobuf 1.25.2 gRPC 1.65.1 gRPC-Swift 1.21.0

Platform: iOS

Expected behaviour

I expect that the channel will normally handle this number of requests as we can see that it is only 20-30 requests.

Additional information

Options for each unary call:

static var unaryCallOptions: CallOptions {
    let timeLimit: TimeLimit = .timeout(.seconds(5))
    #if DEBUG
    var logger = Logger(label: "[\(Self.self)]")
    logger.logLevel = .trace
    return CallOptions(timeLimit: timeLimit, logger: logger)
    #else
    return CallOptions(timeLimit: timeLimit)
    #endif
}

Keepalive

let keepalive = ClientConnectionKeepalive(
    interval: .minutes(5),
    timeout: .seconds(30),
    permitWithoutCalls: true
)

GRPCChannelPool configuration:

let channel = try GRPCChannelPool.with(
    target: .hostAndPort(
        configuration.host,
        configuration.port
    ),
    transportSecurity: .tls(
        configuration.tlsConfiguration
    ),
    eventLoopGroup: configuration.eventLoopGroup,
    { [weak self] channelConfiguration in
        channelConfiguration.keepalive = configuration.keepalive
        channelConfiguration.delegate = self
        channelConfiguration.connectionBackoff = ConnectionBackoff(
            initialBackoff: 1,
            maximumBackoff: 15,
            multiplier: 1.5,
            minimumConnectionTimeout: 15
        )
        #if DEBUG
        var logger = Logger(label: "[Channel gRPC]")
        logger.logLevel = .trace
        channelConfiguration.backgroundActivityLogger = logger
        #endif
    }
)

Request example:

public func getData(
    for id: UUID,
    options: CallOptions
) async throws -> Data {
    let client = try dependency.getClient()
    async let task1 = client.getX1(
        X1Request(uids: [id]),
        callOptions: options
    )
    async let task2 = client.getX2(
        X2Request(uids: [id]),
        callOptions: options
    )
    let (response1, response2) = try await (task1, task1) // ! error !
    return ...
}

Generated code:

public func getX(
  _ request: Frontend_PpX_V1_XRequest,
  callOptions: CallOptions? = nil
) async throws -> Frontend_PpX_V1_XResponse {
  return try await self.performAsyncUnaryCall(
    path: Frontend_PpX_V1_XServiceClientMetadata.Methods.getX.path,
    request: request,
    callOptions: callOptions ?? self.defaultCallOptions,
    interceptors: self.interceptors?.makeXInterceptors() ?? []
  )
}

public func performAsyncUnaryCall<Request: Message & Sendable, Response: Message & Sendable>(
  path: String,
  request: Request,
  callOptions: CallOptions? = nil,
  interceptors: [ClientInterceptor<Request, Response>] = [],
  responseType: Response.Type = Response.self
) async throws -> Response {
  let call = self.channel.makeAsyncUnaryCall(
    path: path,
    request: request,
    callOptions: callOptions ?? self.defaultCallOptions,
    interceptors: interceptors
  )

  return try await withTaskCancellationHandler {
    try await call.response
  } onCancel: {
    call.cancel()
  }
}

sergeiromanchuk avatar Feb 27 '25 09:02 sergeiromanchuk

unavailable (14): Transport became inactive means that the connection was unexpectedly dropped (which is backed up by NIOCore.ChannelError.ioOnClosedChannel).

If the client closed the connection because of a local error condition we should see that. So the working hypothesis is that the server dropped the connection. It's unclear why that's the case.

Could you provide the complete trace logs for when this happens? It might shed some light on this.

glbrntt avatar Feb 27 '25 10:02 glbrntt

@glbrntt Is that valuable logs?

2025-02-27T09:45:36+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.18 grpc.conn.addr_remote=10.229.0.160 grpc_request_id=5BFD6881-842F-42AB-95F1-AC0E322817FE [GRPC] failing buffered writes

sergeiromanchuk avatar Feb 27 '25 10:02 sergeiromanchuk

No, that will be a result of the connection being dropped, not the reason it drops.

glbrntt avatar Feb 27 '25 10:02 glbrntt

GRPCTraceLogs.txt

cc: @glbrntt

sergeiromanchuk avatar Feb 27 '25 10:02 sergeiromanchuk

In my code, I do the following

task?.cancel()
task = Task {
    let data = try await service.getData(...)
    // handle data
}

I noticed that the error appears when I make a cancel for the task on which I execute the request, but by removing the cancellation of the task I found that the number of executed requests goes up to 1000+ and there is no problem. Why is this?

cc: @glbrntt

sergeiromanchuk avatar Feb 27 '25 11:02 sergeiromanchuk

I also found out that if timeout is triggered in requests, my channel also goes down as a result

LogsWithGRPCPool.txt.zip

sergeiromanchuk avatar Feb 27 '25 12:02 sergeiromanchuk

No such stream logs

When debugging, I also found that gRPC calls channelRead when receiving a response and tries to find the corresponding stream, which was previously closed and removed. Then, it sends the error NoSuchStream.

Image Image

Image


2025-02-27T14:45:58+0100 error [Channel gRPC] : error=NoSuchStream(streamID: HTTP2StreamID(233), location: "NIOHTTP2/ConnectionStreamsState.swift:305") grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] grpc client error
2025-02-27T14:45:58+0100 error [Channel gRPC] : error=ioOnClosedChannel grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] grpc client error
2025-02-27T14:45:58+0100 debug [Channel gRPC] : connectivity_state=ready grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] deactivating connection
2025-02-27T14:45:58+0100 debug [Channel gRPC] : delay=0 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 [GRPC] scheduling connection attempt
2025-02-27T14:45:58+0100 debug [Channel gRPC] : grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 new_state=transientFailure old_state=ready [GRPC] connectivity state change
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=BEE94883-7EFB-4287-8F56-60F6BB3EFF76 [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=BEE94883-7EFB-4287-8F56-60F6BB3EFF76 [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=F0BE824A-04A4-4074-9A5B-43700DF739EB [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=F0BE824A-04A4-4074-9A5B-43700DF739EB [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95EFBB4C-09FF-44BF-8DB5-D8E11ADD7262 [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95EFBB4C-09FF-44BF-8DB5-D8E11ADD7262 [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95F10C40-D09F-493F-87A0-2C322C25653E [GRPC] grpc client error
2025-02-27T14:45:58+0100 trace [CallOptions] : call_state=closed grpc.conn.addr_local=10.179.0.62 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/2 grpc_request_id=95F10C40-D09F-493F-87A0-2C322C25653E [GRPC] failing buffered writes
2025-02-27T14:45:58+0100 debug [Channel gRPC] : connectivity_state=transientFailure grpc_connection_id=7F6EA520-3D88-4156-9712-30DC60993782/3 [GRPC] vending multiplexer future

sergeiromanchuk avatar Feb 27 '25 13:02 sergeiromanchuk

https://stackoverflow.com/questions/69422236/grpc-swift-idle-stream-cause-next-rpc-call-to-fail-with-unavailable-14-due-to

sergeiromanchuk avatar Feb 27 '25 14:02 sergeiromanchuk

It looks like the main problem is that in the list of streams is not found a particular stream after which the channel is closed. If I load data into a cell and cancel tasks that are no longer needed, I always catch error on stream 29.

(lldb) p underlyingError
(NIOHTTP2.NIOHTTP2Errors.NoSuchStream) {
  streamID = (networkStreamID = 221)
  location = "NIOHTTP2/ConnectionStreamsState.swift:305"
}

Why is that happening?

cc: @glbrntt

sergeiromanchuk avatar Feb 27 '25 19:02 sergeiromanchuk

Found related issue: https://github.com/grpc/grpc-swift/issues/1421

sergeiromanchuk avatar Feb 27 '25 20:02 sergeiromanchuk

I tried a solution to create 1 channel for 1 request. In the end, it helped. Well, it's impossible to work with this because the loading time is incredibly long. I really need some kind of solution that will allow me not to close the channel with a large number of requests. I believe that this behavior is indeed a bug.

I also haven't found a way to optimize the connection time of the channel to the server in any way, so the question is relevant.

I also can't use your new version of the library (2.0.0) because I see that it only supports iOS 18 and our project supports iOS 15.

cc: @glbrntt

sergeiromanchuk avatar Feb 27 '25 22:02 sergeiromanchuk

Basically, I understand it like this, and I'm almost sure it's true. We cancel a stream, and it gets added to recentlyResetStreams. This list has a limit of 32 streams. When it fills up, the last stream is removed, and a new one is added.

If we scroll quickly, this reset list fills up very fast, and streams get removed from it just as quickly. Then, packets arrive from old streams/requests, which we try to process. We check if the corresponding stream exists in activeStreams—it doesn't. Then we check recentlyResetStreams, but the stream isn't there either because the list filled up with new entries and our stream got evicted before the packet arrived.

As a result, we return a noSuchStream error, and the channel closes. That's it.

Oh, and yeah—this limit of 32 is hardcoded, so we can't change it. The TODO for fixing it has been there since 2019, so I guess it's about time to do it. 😅


Image


Image


Image

sergeiromanchuk avatar Feb 28 '25 09:02 sergeiromanchuk

What do you think?

сс: @jackystd @glbrntt

sergeiromanchuk avatar Feb 28 '25 09:02 sergeiromanchuk

Hey @sergeiromanchuk -- thanks for providing all of this info and good job on diagnosing the issue!

I think allowing the max reset streams limit to be configurable and then exposing that config in gRPC is reasonable. I'll open a separate issue in https://github.com/apple/swift-nio-http2 to track doing that. We can track the gRPC changes in this issue.

In the meantime you can workaround this by not cancelling the streams.

glbrntt avatar Feb 28 '25 09:02 glbrntt

Thanks for the reply @glbrntt. Unfortunately, it's not enough to just not cancel streams, I'll also have to limit the number of calls because the channel also drops with an error if some request is completed with timeout. I think it is another issue.

sergeiromanchuk avatar Feb 28 '25 09:02 sergeiromanchuk

Here's the http/2 issue: https://github.com/apple/swift-nio-http2/issues/493

glbrntt avatar Feb 28 '25 09:02 glbrntt

I'll also have to limit the number of calls because the channel also drops with an error if some request is completed with timeout.

Can you elaborate on this one? An RPC timing out won't close the connection, but an RPC may time out because of a connection timeout.

glbrntt avatar Feb 28 '25 09:02 glbrntt

When I sent a lot of tasks without canceling ~1000, the connection would process them all for probably about 3 minutes and they would drop with a timout error. After all the streams were processed, the connection would also close with ioOnClosedChannel error.

sergeiromanchuk avatar Feb 28 '25 10:02 sergeiromanchuk

@glbrntt I don't cancel unary calls, while limiting parallel execution to 4 tasks. However, the channel still closes with an error. It seems that the problem is much deeper. Please examine the logs in the ioOnCloseChannel places. (It's not noSuchStream issue for that case)

Logs1.txt Logs2.txt

sergeiromanchuk avatar Feb 28 '25 14:02 sergeiromanchuk

I haven't gotten to the bottom of this yet but here's what I've found so far:

The first error=ioOnClosedChannel happens just before stream 45 is closed. Both events happen on connection ID FC01291E-440B-4EE1-89A7-F29256F6CA3C/0.

2025-02-28T14:59:57+0100 debug [Long Channel gRPC] : grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 h2_active_streams=8 h2_stream_id=HTTP2StreamID(45) [GRPC] HTTP2 stream closed\
2025-02-28T14:59:57+0100 error [Long Channel gRPC] : error=ioOnClosedChannel grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 [GRPC] grpc client error\

Then we see a flurry of request errors on the same connection, presumably one for each of the other open streams:

2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=4BF61B3D-A920-463F-B175-BB0020FA4086 [GRPC] grpc client error\
2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=49316470-3C16-45C2-A051-E4C4587DC0B2 [GRPC] grpc client error\
2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=4B3C1FAA-2C7D-4357-AA9A-DC5609A2432D [GRPC] grpc client error\
2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=4F0A125F-0C73-425D-AE57-D3E6A9D100EB [GRPC] grpc client error\
2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=B15437BD-9808-4E66-A576-43979A735F9D [GRPC] grpc client error\
2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=FA8E4D2B-A8EF-4805-8EB7-B45C5FFD7CD3 [GRPC] grpc client error\
2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=63F8A90C-0F76-4A16-9F97-A0E8E1202C8F [GRPC] grpc client error\
2025-02-28T14:59:57+0100 error [CallOptions] : error=unavailable (14): Transport became inactive grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=A3BFF390-8382-4AFB-8AAE-CBC40AF22709 [GRPC] grpc client error\

That makes sense: an error just happened.

What's unclear is why we get ioOnClosedChannel, there's no obvious close event.

One thing I noticed in the logs is that streams on the same connection have much higher IDs so I looked for all logs for stream ID 45 on that connection:

2025-02-28T14:58:08+0100 debug [Long Channel gRPC] : grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 h2_active_streams=1 h2_stream_id=HTTP2StreamID(45) [GRPC] HTTP2 stream created\
2025-02-28T14:59:57+0100 debug [Long Channel gRPC] : grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 h2_active_streams=8 h2_stream_id=HTTP2StreamID(45) [GRPC] HTTP2 stream closed\

I looked at the surrounding logs from when the stream was created on that connection; the next related log is:

2025-02-28T14:58:10+0100 debug [CallOptions] : grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 [GRPC] activated stream channel\

which tells us grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 so we can find more logs:

2025-02-28T14:58:08+0100 debug [Long Channel gRPC] : grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 h2_active_streams=1 h2_stream_id=HTTP2StreamID(45) [GRPC] HTTP2 stream created\
2025-02-28T14:58:10+0100 debug [CallOptions] : grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 [GRPC] activated stream channel\
2025-02-28T14:58:10+0100 trace [CallOptions] : grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 h2_end_stream=false h2_headers=[([], ":method", "POST"), ([], ":path", "/frontend.pp_profile.v1.ProfileWatchService/GetProfiles"), ([], ":authority", "api-ios-dev.platocorp.com"), ([], ":scheme", "https"), ([], "content-type", "application/grpc"), ([], "te", "trailers"), ([], "grpc-timeout", "4999989u"), ([], "authorization", "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NDA3NTEyMDIsImlhdCI6MTc0MDc1MTA4MiwiaXNzIjoiZGV2Iiwic3ViIjoiMnBmZzJ2MWFqdjF5eS0yZXlkc2N5M2dsYTM1OjE2In0.E66X08Y7G9qSU1SzAChgQZEOKWQG7DgD2OgSD8g8ISo"), ([], "x-plato-build", "4.9.0+0"), ([], "x-plato-request-origin", "privateRoomCreate"), ([], "user-agent", "iOS, Apple, iPhone12-1, 18.3.0, 9B15CC9E-FA2F-402A-B4B9-86CE50CACBC8")] h2_payload=HEADERS [GRPC] writing HTTP2 frame\
2025-02-28T14:58:10+0100 trace [CallOptions] : grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 h2_data_bytes=25 h2_end_stream=false h2_payload=DATA [GRPC] writing HTTP2 frame\
2025-02-28T14:58:10+0100 trace [CallOptions] : grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 h2_data_bytes=0 h2_end_stream=true h2_payload=DATA [GRPC] writing HTTP2 frame\
2025-02-28T14:58:10+0100 trace [CallOptions] : grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 h2_end_stream=false h2_headers=[([], ":status", "200"), ([], "content-type", "application/grpc"), ([], "date", "Fri, 28 Feb 2025 13:58:09 GMT"), ([], "server", "envoy")] h2_payload=HEADERS [GRPC] received HTTP2 frame\
2025-02-28T14:58:10+0100 trace [CallOptions] : grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 h2_data_bytes=25 h2_end_stream=false h2_payload=DATA [GRPC] received HTTP2 frame\
2025-02-28T14:58:10+0100 trace [CallOptions] : grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 grpc_request_id=93506D75-DF27-4827-9B4A-EEB19E54F082 h2_end_stream=true h2_headers=[([], "grpc-status", "0"), ([], "grpc-message", "")] h2_payload=HEADERS [GRPC] received HTTP2 frame\
2025-02-28T14:59:57+0100 debug [Long Channel gRPC] : grpc.conn.addr_local=10.179.0.82 grpc.conn.addr_remote=10.229.2.186 grpc_connection_id=FC01291E-440B-4EE1-89A7-F29256F6CA3C/0 h2_active_streams=8 h2_stream_id=HTTP2StreamID(45) [GRPC] HTTP2 stream closed\

We create the stream at 14:58:08 and then activate it at 14:58:10. It's a little surprising that there's up to 2 seconds between the two events. Nonetheless we then see the request happen and the RPC completes at 14:58:10. What's odd is that it's not until 14:59:57 that we see the "HTTP2 stream closed" log. That should happen very shortly after the RPC completed at 14:58:10.

glbrntt avatar Mar 03 '25 17:03 glbrntt

@glbrntt Okay, I understand correctly that we can't draw any conclusions based on the logs yet.

I also want to send you additional information that I apparently encountered when the server didn't send me a response or something else, my request ended after a timeout. My channel also closed. Here's my logs.

CloseChannelOnFailed.txt

sergeiromanchuk avatar Mar 03 '25 18:03 sergeiromanchuk

@glbrntt Any updates?

sergeiromanchuk avatar Mar 07 '25 21:03 sergeiromanchuk

I get similar situation, start >20 unary requests when app launched here is the log version: 1.7.3 using cocoapod

grpc_error_log.txt

dongfanger2018 avatar Mar 10 '25 06:03 dongfanger2018

Hi @sergeiromanchuk, are you talking about the request with stream ID 3? Looking at the logs that appears to be a server streaming request for handleWatchConfig. If it isn't, can you identify which RPC you were issuing that timed out?

@dongfanger2018 The cocoapods have been unsupported for a long time, there are a number of bugs in 1.7.3 that have been fixed since that was released. You'll need to update to something newer for us to be able to support you.

Lukasa avatar Mar 10 '25 08:03 Lukasa

@Lukasa Everything related to errors not only stream ID 3, I can not cope with the fact that if I send more than 20 requests, the channel is closed. This is definitely a bug and does not relate to the Cocoapods problem or some old version. All we could do was limit requests to a maximum 12. This is very bad. This behavior seriously limits our use of gRPC. Especially the fact that I can't cancel tasks, because canceling tasks closes the channel too. It turns out to be a situation where I find myself between a rock and a hard place.

sergeiromanchuk avatar Apr 05 '25 13:04 sergeiromanchuk

@glbrntt @Lukasa Any updated on this?

I don't think it will be difficult for you to reproduce this bug on your machines if necessary.

sergeiromanchuk avatar Apr 05 '25 13:04 sergeiromanchuk

@Lukasa @glbrntt

Main Issue: Channel Closure

Sub-issues:

  1. Closing the channel when canceling unary requests
  2. Closing the channel when adding more than 20-30 requests.

sergeiromanchuk avatar Apr 05 '25 13:04 sergeiromanchuk

https://github.com/grpc/grpc-swift/issues/2202#issuecomment-2690212113

@glbrntt It looks like the issue has been fixed, have you already updated the version in gRPC?

sergeiromanchuk avatar Apr 05 '25 13:04 sergeiromanchuk

@glbrntt It looks like the issue has been fixed, have you already updated the version in gRPC?

There hasn't been a release of swift-nio-http2 containing the fix yet. Hopefully soon which will then unblock picking it up here.

glbrntt avatar Apr 07 '25 09:04 glbrntt

Thank you @glbrntt, are there any updates on this bug?

sergeiromanchuk avatar Apr 07 '25 11:04 sergeiromanchuk