Orleans.Clustering.Kubernetes icon indicating copy to clipboard operation
Orleans.Clustering.Kubernetes copied to clipboard

Auto-expose Gateway port

Open galvesribeiro opened this issue 6 years ago • 9 comments

The provider should have an option to automagically expose the Gateway/Proxy port when a silo has the Gateway installed (i.e. when siloEntry.ProxyPort > 0). That would open the Orleans cluster to have Orleans clients outside Kubernetes Cluster boundaries allowing non-containerized apps to talk to the cluster (very useful in on-premises scenarios).

That would (optionally) remove the need for people to expose the Gateway port to outside Kubernetes cluster.

However, there is an issue to consider while doing that. If the port is fixed, each Kubernetes worker node would not be able to run more than 1 silo otherwise, we would have port conflicts.

To workaround that, we could (optionally) randomly generate port numbers for the Gateway and creating respective NodePort objects but, that would make on-premises deployments a potential nightmare in terms of firewalls and routing since the ports are unknown until the silo is initialized. In cloud services like AKS, the Azure firewall is integrated with Kube API and it (optionally) open ports automatically whenever a new service is exposed by a pod.

Need to investigate that and come up with a solution.

galvesribeiro avatar Feb 09 '18 04:02 galvesribeiro

Hey @galvesribeiro, I think there's no issue if I understand you correctly.

It should be enough to just create a single NodePort Service for the entire cluster regardless of how many nodes and silos we have, e.g.:

  • 2 Nodes, 20 Sios. Each silo binds to 1001 (silo port) and 2001 (gateway port). Ports are internal to its container so no port collision so far.
  • Create a single NodePort Service that listens on 31500 (K8S requires them to be in some allowed range if I remember correctly). Kubeproxy will expose the port 31500 on every node (once) and will load-balance traffic to that port between all silos on their port 2001.

I believe that might also work with LaodBalancer services, which would make things even easier.

ilyalukyanov avatar Feb 09 '18 05:02 ilyalukyanov

Hey @ilyalukyanov That was the approach I was looking for before. If that was the case, the Orleans Client would only need to add a single Gateway IP:port which would be great. However, the problem is that Orleans requires the client to be able to address individual silos directly and also Orleans has its own balancing algorithms which are not trivial as round-robin(which is the default for Kubeproxy-based services).

That was a long discussion I had with @ReubenBond while we were discussing about this provider and that is why I would like to see a better option for that.

At the membership object, ProxyPort can be whatever we want. So my option would be somehow follow those steps when the silo with the gateway installed is starting up:

  1. Query KubeAPI if a (random) port is available on all nodes (need to check if that is possible)
  2. If the port is available, make it reserved in KubeProxy and "attach it" to the container gateway port
  3. Write the Kubeproxy exposed port to the membership object.

With that, even if all the gateways are internally listening to the same port, every one of those will have a different port in a service exposed by KubeProxy.

It will probably works, but in the end, will bring the firewall nightmare pretty easily and not to mention that every time you mark a silo as dead, it needs to release the port allocation.

Again, need to come up with something easier. As I was discussing with Reuben before, the way Orleans (and other distributed systems) deal with balancing by requiring direct address of a particular node, is something that may cause a maintenance nightmare on modern infrastructure like containers.

Lets think more :)

galvesribeiro avatar Feb 09 '18 14:02 galvesribeiro

@galvesribeiro I see what you mean now. How about this approach?

  1. We provision silos as a StatefulSet so that if for example its name is ‘OrleansSilo’, all pods would be named ‘OrleansSilo-0’, ‘OrleansSilo-1’, etc. StatefulSet provides useful guarantees of pretty much static naming and host numbering.
  2. In silo configuration we add a property to specify an array of ports, which will be used for allocating node ports.
  3. Then we create a NodePort Service per silo matching it by pod name (I believe this can be done) and exposing on the port taken from the array (provided in the config above) taken by the index equal to the numeric suffix in pods’ names.

This way the same silo is always exposed on the same predictable node port.

ilyalukyanov avatar Feb 14 '18 19:02 ilyalukyanov

Interesting... Will read more about StatefulSet. I'm just concerned about the approach used in the configuration.

If we specify a fixed array of ports, that means we have to anticipate the number of silos we will have, and that is not a good thing on dynamic environments where silos come and go.

This dynamic nature inherent on cloud environments is what bring us elasticity but it can be a problem on stateful scenarios like in Orleans where we have to individually address each gateway.

galvesribeiro avatar Feb 14 '18 19:02 galvesribeiro

This array would ultimately sit in the custom resource and will be processed by its controller. Yes it’s another parameter to change with the number of replicas, but it’s still easy enough.

What I had in mind putting it in the configuration was how the provider currently creates all K8S resources if they don’t exist.

We can be more creative and replace it with a port range (min/max) and consider moving it out to a ConfigMap which the custom resource controller will be monitoring. Not sure tho to what extent the latter is possible.

We could also specify a min port and take a subsequent port for each silo, but that would be less explicit and less predictable as node ports are allocated randomly and a port in the middle of our range can be always taken by something else.

I also don’t see a lot of value in having more silos than nodes as:

  1. Going beyond the number of nodes decreases fault tolerance as with a node going down, multiple silos would go down to increasing the impact.
  2. It’s more preferable to have one bigger silo than multiple smaller ones on the same node.

Thus we can restrict the port range by the number of nodes I think.

ilyalukyanov avatar Feb 14 '18 19:02 ilyalukyanov

Kube deployments deal with desired state. So if you define 3 silos, and you have 2 nodes, it will create 2 silos in one of the nodes. That can happen. Machines fail and we can't require the user to have to reduce the number of silos to keep it even with the number of actual nodes.

I agree that having more than one silos in a node is not the best idea, but there are legitimate cases where process-level availability still have a point.

Maybe have an optional port range part of the CRDs is a good idea... Let me think about it...

galvesribeiro avatar Feb 14 '18 19:02 galvesribeiro

Kube deployments deal with desired state. So if you define 3 silos, and you have 2 nodes, it will create 2 silos in one of the nodes.

Yeah, I know. What I was saying is that it's not particularly useful to provision 3 replicas when you have 2 nodes. As far as I understand, in all cases it's better to have 2 replicas of (3/2)*x size rather than 3 replicas of size x when you only have 2 nodes. Especially if we take into account overhead brought by container base image.

Nevertheless, I agree that if a we have 3 nodes and 3 replicas and one of the nodes goes down, then 3 containers would be distributed over 2 nodes.

In that case if we decide to introduce the restriction, I think that can be addressed by pod affinity. I.e. we can easily specify that no two replicas can reside on any one node.

I agree that having more than one silos in a node is not the best idea, but there are legitimate cases where process-level availability still have a point.

Which cases are you thinking of for example?

One that comes to my mind is resource distribution, i.e. if 1 of two silos goes down, the remaining one would take double load. It would be convenient in that case to have double the amount of capacity provisioned automatically in response to the failure or upfront (i.e. second replica).

I don't think that particular example makes any difference tho. If we were to design a system that's supposed to tolerate a certain level of failure, we could (I'd even say should) address it by adding more nodes or allocating more capacity to containers.

Anyways, I agree with you that it would be more transparent and up to a developer if we don't make any assumptions and don't introduce such restrictions. Generally always a better practice. On the other hand the restriction is attractive to me as it simplifies a couple of things. Just another approach to consider, which might be good enough for the first production-ready release.

ilyalukyanov avatar Feb 14 '18 20:02 ilyalukyanov

Is there a way to manually expose the gateway port right now so that ClusterClient works outside of k8s? I have some old WinForm on-premises applications that connect to my Orleans cluster, I am currently using SQL for a membership table.

Reading https://github.com/dotnet/orleans/pull/4301 now...

DarkCow avatar Nov 21 '18 15:11 DarkCow

Sorry to bump this... but this feature would be nice. I have been arguing with orleans dev that there should be ability for client to connect outside of k8. @galvesribeiro do you have any example or workaround how can I make this work currently in k8? I need to connect with client outside of Kubernetes Cluster and I have tried lot of stuff without success.

ScarletKuro avatar Feb 04 '21 07:02 ScarletKuro