azure-relay
azure-relay copied to clipboard
Scale
@dstuckims - Can you provide some bullet points as an overview of what should be covered?
Maybe this is more of a load balancing article... We should mention the that you can have up to 25 listeners, and here is how load balancing works:
When a rendezvous needs to occur (a new client connection is created/opened):
- Get a local copy of the list of all the known load-balanced listeners for the address requested by the sender (this comes from a cache which is updated every 500ms).
- If the list of listeners is empty and we haven't refreshed the list of listeners exactly once force refresh the list of known load-balanced listeners for the endpoint.
- If the list of listeners is empty return an exception to the sender and stop.
- Pick a random index into the list of potential listeners.
- Try to rendezvous with the selected listener.
- If that rendezvous succeeds then stop.
- If the rendezvous attempt with the selected listener doesn’t succeed within 10 seconds remove the selected listener from the list of listeners to try.
- If more than 60 seconds have passed return an exception to the sender.
- Go to step 2.
Ultimately it will try the list of known listeners twice before giving up. The next listener to attempt is picked using a random index into the list.
@sethmanheim - Do you think that this can be distilled into an article?
@jtaubensee Possibly, but this list is pretty high level. This kind of sounds like our internal implementation, is that really what we want to document? From a user perspective, what does this mean? Are there other load balancing considerations? Is it worth including sample code?
Also, when you say "this is how load balancing works," is this for hybrid connections, WCF Relay, or anything with relays?
What happens when it "gives up" (last paragraph)? :-) An exception?
https://azure.microsoft.com/en-us/blog/now-available-relay-load-balancing-for-windows-azure-service-bus/
The rendezvous algorithm is the same for WCF Relays and HybridConnections. The key takeaway is that each listener is picked randomly. This gives fairly even distribution across all listeners.
When it "gives up" there are several different exceptions:
- If there is no Persistent Endpoint (WCF Relay or HybridConnection) and no dynamic WCF Relay listeners then
EndpointNotFoundException: The endpoint was not found. Endpoint does not exist.
is thrown. - If there are zero listeners but a Persistent endpoint (WCF Relay or HybridConnection) then
EndpointNotFoundException: The endpoint was not found. There are no listeners connected for the endpoint.
is thrown. - If there are 1 or more listeners and all of them failed to accept the client then
EndpointNotFoundException: The endpoint was not found. None of the connected listeners accepted the connection within the allowed timeout.
is thrown.
After a little more thought, I'm struggling to see the need for this article. We could even add the load balancing part as an FAQ item. Any objections to holding off on this one?
Note that we have this, too, but it's buried inside Messaging info: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-architecture#processing-of-incoming-relay-requests.
This is an ask to add documentation around load balancing among listeners for Hybrid Connections and WCF Relay according to the description above from David. The only place known where we have something around this topic is at https://docs.microsoft.com/en-us/azure/service-bus-relay/relay-hybrid-connections-protocol#listen-message.