router
router copied to clipboard
[multipart-subs] clarifications about `"payload": null`
Attempt at clarifying what the behaviour should be when a "payload": null
message is received. This PR models them as fatal errors because it is what Apollo Kotlin is currently expecting but anything else would work too as long as there is consensus on what the expected behaviour is.
Relates to https://github.com/apollographql/router/issues/4634
CI performance tests
- [ ] reload - Reload test over a long period of time at a constant rate of users
- [ ] events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
- [ ] events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
- [ ] large-request - Stress test with a 1 MB request payload
- [x] const - Basic stress test that runs with a constant number of users
- [ ] no-graphos - Basic stress test, no GraphOS.
- [ ] step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
- [ ] events - Stress test for events with a lot of users and deduplication ENABLED
- [ ] events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
- [ ] events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
- [ ] events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
- [ ] xxlarge-request - Stress test with 100 MB request payload
- [ ] xlarge-request - Stress test with 10 MB request payload
- [x] step - Basic stress test that steps up the number of users over time
I'm going to mark this as a draft just for now, particularly since this is just an attempt at clarifying and we don't have an issue with any previous discussion on it.
For more context, this is the behaviour in Apollo iOS:
-
"payload": null
is effectively ignored and that alone will not throw an error - test - If
"errors" : []
is present an error will be thrown - test - We don't have a test that combines them but I put one together quickly and the behaviour is the same with
"payload": null
having no effect.
@alessbell has got a draft PR up at https://github.com/apollographql/apollo-client/pull/11594 which matches that behavior.
~Personally I think this spec change and ignoring "payload": null
is the right change to make. I don't see any need to take action on an empty payload. I believe the context is which it's sent is to end a subscription and we don't take any particular action for that elsewhere.~
Update for ~strikeout~: @martinbonnin may have won me over with his thoughts on not introducing another way to terminate the subscription when we already have one. Which makes me reconsider whether we should not be treating this as another heartbeat (ignoring it) either. 🤔
For more context, this is the behaviour in Apollo iOS:
"payload": null
is effectively ignored and that alone will not throw an error - test- If
"errors" : []
is present an error will be thrown - test- We don't have a test that combines them but I put one together quickly and the behaviour is the same with
"payload": null
having no effect.@alessbell has got a draft PR up at apollographql/apollo-client#11594 which matches that behavior.
~Personally I think this spec change and ignoring
"payload": null
is the right change to make. I don't see any need to take action on an empty payload. I believe the context is which it's sent is to end a subscription and we don't take any particular action for that elsewhere.~Update for ~strikeout~: @martinbonnin may have won me over with his thoughts on not introducing another way to terminate the subscription when we already have one. Which makes me reconsider whether we should not be treating this as another heartbeat (ignoring it) either. 🤔
Still what I found weird about this issue is that even if the fix is made on the client side (iOS,Kotlin,Javascript) the router is currently terminating the connection when receiving that kind of payload. Still think that the router shouldn't kill the connection.
When we were doing test on that feature, we saw that UI --> Subgraph everything was good, and we never got that payload mention in the issue.
When adding the router into the equation, this weird payload is now there and seems to be added by the router.
When you receive this payload from the router it often means that the stream between router - subgraph has been closed. If it's not expected then there might be another issue somewhere. And maybe related to the missing ping message sent from the router to your subgraph ? It's an issue that has been recently used. What do you use as subgraph ?
When you receive this payload from the router it often means that the stream between router - subgraph has been closed.
If that can help, we reproduced the issue while capturing the traffic between (1) the client and router and (2) the router and subgraph.
The session between the router and subgraph is closed gracefully, as noted by the last exchange ("complete") between the router and subgraph. The router should gracefully close the subscription on the client side, but Apollo Client panics when handling the last message from the router.
The router should gracefully close the subscription on the client side, but Apollo Client panics when handling the last message from the router.
Yes @anthonyvallee that's the reason why we want to improve our specs docs to explain that could happen and it's not an error so the client should not throw an error. That being said is it expected on your side from your subgraph that the connection is gracefully close ? Were you able to understand why ?
Is it expected on your side from your subgraph that the connection is gracefully close ? Were you able to understand why ?
Yes, we designed a Ping subscription for our tests, similar to the ping
system command. It's normal for the subscription to complete after N replies are sent to the client.
On more detail:
{"payload": null}
should not be considered as an error, it's not an error. It can happen and should just be ignored/skipped. The only thing you should check to know if it's a terminal error or not is the closing boundary --graphql--
. If it's still unclosed --graphl
it means the subscription is still alive and should not be closed. It's not a change in the specs, it's just a more detailed explanation