alluxio
alluxio copied to clipboard
Better support exposing the Alluxio service to external the K8s cluster
Is your feature request related to a problem? Please describe. It's becoming more common that users want to host the Alluxio service on K8s while some external applications need to access the Alluxio cluster from outside the K8s cluster.
In the current state, the users need to:
- Change the master K8s services to be accessible from outside the K8s cluster. This typically changes the service to NodePort or Ingress.
- Somehow expose the worker pods to the external. This is much harder than 1 because worker pods are dynamic and do not have associated Services. One way is to use
hostNetwork=true
for all workers and clients will then talk to worker nodes.
Describe the solution you'd like We need one solution for:
- Enabling master pods to be accessible from outside
- Enabling worker pods to be accessible from outside
- Ideally use only one switch to control all
The biggest challenge is the worker pods. Using a combination of StatefulSet deployed workers + externalTrafficPolicy
Service can be a solution. The Service maps to the worker pod by name, which becomes deterministic because workers are now deployed with StatefulSet.
apiVersion: v1
kind: Service
metadata:
name: worker-0
spec:
type: NodePort
externalTrafficPolicy: Local
selector:
statefulset.kubernetes.io/pod-name: worker-0
ports:
- protocol: TCP
port: 19998
targetPort: 19998
The worker pods now need anti affinity defined, so no two worker pods appear on one node.
The master pods can be exposed similarly.
Describe alternatives you've considered
Use hostNetwork
to deploy all master and worker pods and access the Alluxio pods by node IP. This is the cleanest way as of Alluxio v2.8. The challenge is hostNetwork
requires admin privileges and may even incur port collision with other services.
Urgency MEDIUM. There are existing use cases for this setup.
Additional context Add any other context or screenshots about the feature request here.
@ZhuTopher @ssz1997 for visibility
I agree with the proposal to switch Alluxio workers to use StatefulSet
. It seems that they are not as stateless/idempotent as we'd thought. Furthermore then the solution we use to expose Masters can be leveraged to expose Workers as well.
The main difficulty we currently have with exposing Alluxio to clients outside of k8s is that Workers register to the Master using their local hostname, which may not be resolvable to clients outside of k8s. I had previously proposed elsewhere that a possible solution would be through CoreDNS plugins:
- configurable via k8s ConfigMap
- Add the k8s_external plugin to allow specifying external IPs to k8s Services, point clients to CoreDNS as the authoritative nameserver for the domain
@ZhuTopher The worker has alluxio.worker.hostname
and alluxio.worker.container.hostname
just to pass both the pod and node ip to the master (then the client). So the client will connect to the pod IP if any. I'm just trying to say that the worker reports those IPs to the master as much as it can, so you may change the client logic as you need. Just a thought before I look into your proposal, hope that helps.
However if the work pod is not visible via either the pod or host IP(no host port opened for it), then we need a new mechanism.
Oh that's right, I don't remember if this is already the case or not but we'd want to set the following in our helm chart:
alluxio.worker.hostname=status.hostIP
and alluxio.worker.container.hostname=status.podIP
- K8s doc ref: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#podstatus-v1-core
That works for the worker addresses only if the worker Pod(s) use hostPort
to bind to the same port on the nodes. Also I don't recall if the master has that mechanism of supplying the "container" hostname?
No the masters don't have that equivalent because we use Service to handle the name resolution. Clients talk to services so no need to know the pod names. But yea we currently don't have a uniformed definition of what hostnames map to which use cases (internal/external the k8s cluster etc). The existing configs are more on demand. If there's a chance to unify all those, I'm totally in :)
Solution should be independent whether hostNetwork
is enabled or not.
Init container for workers should collect metadata and use an init script to talk to master and register themselves.
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
and
- register-worker
- --ip
- $(POD_IP)
- --k8s-namespace
- $(POD_NAMESPACE)
Something like above.
This will work regardless I enabled hostNetwork
or not.
hostNetwork: {{ $hostNetwork }}
hostPID: {{ $hostPID }}
dnsPolicy: {{ .Values.worker.dnsPolicy | default ($hostNetwork | ternary "ClusterFirstWithHostNet" "ClusterFirst") }}
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.