grpc-swift
grpc-swift copied to clipboard
Attempting streaming service with swift server and non-swift client does not work.
Describe the bug
I have attempted to run the RouteGuide sample, specifically with the listFeatures server side streaming service. Everything works fine when I run it using a grpc-swift implementation for both client and server. However, I want to be able to stream from a Vision Pro to an ubuntu machine. I have implemented the code successfully from Vision Pro (server) to a macOS 15 machine (client). When I try and create a client using the master branch of gRPC, the stream breaks after a few seconds. I have tried both C++ and Python to create the client. I have also tried running server+client pairs that are local and remote. I have tried all combinations of:
-swift (on Vision Pro) -> python
-swift (on macOS) -> python
-swift (on macOS) -> C++
-swift (on Vision Pro) -> swift (on macOS).
I am using grpc-swift v 2.0.0-alpha.1. For python I am using grpcio v 1.66.2 (I have even tried 1.67.0rc1) For C++ I am using the master branch of gRPC built with bazel
To reproduce
Steps to reproduce the bug you've found:
- Create a RouteGuide service+server with grpc-swift v2.0.0-alpha.1 and implement the server side streaming function
listFeatures - Create a RouteGuide client for either C++ or python
- Prolong the stream by adding a
try await Task.sleep(nanoseconds: 3_000_000_000)inlistFeaturesinRouteGuideService.swiftbefore thetry await writer.write(feature) - Run the server then the client.
Expected behaviour
I would expect the stream to complete, but it always crashes after only a handful of features listed.
Thanks for filing this and trying out a bunch of different things, I just want to make sure I understand exactly what you tried and what works:
| Server | Client | Works |
|---|---|---|
| Swift (main), Vision Pro | Swift (main), macOS 15 | Yes |
| Swift (main), Vision Pro | Python, macOS | No |
| Swift (main), macOS | Python, macOS | ??? |
| Swift (main), macOS | C++, macOS | ??? |
Could you fill let me know the two unknowns here and confirm the above?
I would expect the stream to complete, but it always crashes after only a handful of features listed.
Can you provide more info here? How did it crash?
| Server | Client | Works | Notes |
|---|---|---|---|
| Swift (main) Vision Pro | Swift (main), macOS 15 | Yes | |
| Swift (main) macOS 15 | Swift (main), macOS 15 | Yes | Run locally (same machine) |
| Swift (main) macOS 15 | Python, macOS 15 | No | Run locally (same machine) |
| Swift (main) macOS 15 | Python, Ubuntu 22.04 | No | |
| Swift (main) Vision Pro | Python, macOS 15 | No | |
| Swift (main) Vision Pro | Python, Ubuntu 22.04 | No | |
| Swift (main) Vision Pro | Python, macOS 14.7 | No | |
| Swift (main) macOS 15 | C++, Ubuntu 22.04 | No |
The crashes seen on the python client have been inconisistent. Sometimes it just freezes. Other times, it will show an error on the client:
An error occurred: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = "UNKNOWN:Error received from peer ipv4:[REDACTED] {grpc_message:"Socket closed", grpc_status:14, created_time:"2024-10-15T09:40:28.555049-04:00"}"
For the C++, I get a print out that says ListFeatures rpc failed.
On the server, I have put a print before and after the writer.write and it will print out the pre-write message, but never get to the post-write print.
Thanks for that. Frustratingly I haven't been able to reproduce this (Swift server, Python client both on macOS 15).
Do you think you'd be able to provide repro code? I.e. a zip/repo somewhere I can checkout and build locally to make sure there's no setup differences between what we're running.
Failing that (or in addition to it), would you be able to provide a .pcap for this?
Sure thing!
I am sure you are aware of this, but real quick here is how to get this code running. RouteGuide.zip
- Extract
RouteGuide cd RouteGuideswift run RouteGuide --serverThis will build the package and start the server.- You can test the swift implementation by opening another terminal and running
swift run RouteGuide. Remember, this will take a little while since I put a 3 second sleep in the service loop.
For the python code RouteGuidePythonClient.zip
- Extract
RouteGuidePythonClient cd RouteGuidePythonClientpython3 -m venv venvsource venv/bin/activatepip install grpciopip install grpcio-toolspython3 client.py
For me, the python client won't get past 4 features before it fails.
I will work on getting you the .pcap.
The pcap is unsupported file type. Here is a gdrive link. Hopefully it will allow you to request access. https://drive.google.com/file/d/1Po3k6-sQLoUvvSN5xPhBpiIwN9CRn2Xi/view?usp=share_link
Thanks @ericdusel-tri - I'm currently away so won't be able to look at this for a little while.
@gjcairo would you be able to take a look if you get a chance?
As a test, I implemented a unary call for GetFeature inside an infinite loop, and this also fails after a few seconds.
Hey @ericdusel-tri - sorry for the delay in looking into this.
I believe the issue is that, on the request from the Python implementation, the request has the END_STREAM header set, while in the Swift client implementation, the request doesn't set this header. This needs some more looking into (hopefully we can get back to you with more answers in the next couple of days), but it's possible this is a bug in the Swift implementation.
I am checking in to see if there has been any development related to this issue? We are working on an application using legacy code from v1, and we would like to upgrade to v2.
I have tried again on the recent beta release and it still fails in the same way.
I tried a new test case where I have a python server with a swift client, and that works fine. I also attempted a bidirectional stream in an attempt to trick the END_STREAM flag on the request, but the server still closed the connection after a few seconds.
@ericdusel-tri looks like the issue was unrelated to my previous comment: we had a couple of bugs in the keepalive logic.
This has now been fixed and merged. I've confirmed using your repro that the changes fix the issue - you can check yourself against main, and the changes will be included in the next release.
Thanks a lot for filing this!
Works great. Thanks!