Pod stuck in "ContainerCreating" status when using Fluid+JuiceFs in a single-node k8s environment
What is your environment(Kubernetes version, Fluid version, etc.)
# helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
fluid default 1 2024-03-14 11:18:34.324181662 +0800 CST deployed fluid-0.9.3 0.9.3-e0184cf
jfsdemo-dataset default 1 2024-03-19 01:15:46.227497126 +0800 CST deployed juicefs-0.2.16 v1.0.0
I have been following this tutorial and attempting to use Fluid+JuiceFs in a single-node k8s environment. https://github.com/fluid-cloudnative/fluid/blob/master/docs/zh/samples/juicefs/juicefs_runtime.md I have successfully completed the previous steps, but I encountered an issue in the final step when creating the pod. It remains in the ContainerCreating state.
# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default demo-app 0/1 ContainerCreating 0 13m
# kubectl describe pod demo-app | tail -n 5
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 47s default-scheduler Successfully assigned default/demo-app to 10.0.2.15
Warning FailedMount 15s (x7 over 47s) kubelet MountVolume.MountDevice failed for volume "default-jfsdemo-dataset" : rpc error: code = Unknown desc = NodeStageVolume: can't get node 10.0.2.15: Get "https://127.0.0.1:6443/api/v1/nodes/10.0.2.15": dial tcp 127.0.0.1:6443: connect: connection refused
# kubectl get dataset -A
NAMESPACE NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
default jfsdemo-dataset 4.00KiB 4.00GiB Bound 39h
Here is my pod.yaml:
# cat sample-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: demo-app
spec:
containers:
- name: demo
image: nginx:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
name: demo
volumes:
- name: demo
persistentVolumeClaim:
claimName: jfsdemo-dataset
k8s node information:
# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.0.2.15 Ready master 463d v1.25.3 10.0.2.15 <none> CentOS Linux 7 (Core) 5.4.228-1.el7.elrepo.x86_64 containerd://1.6.8
If I remove the volume section in the sample-pod.yaml file, the pod can be created normally. I am not sure if this is related to Fluid. If there are any test methods, please let me know.
I can confirm that the port number is correct as I can successfully access it using:
wget --header "Authorization: Bearer <token>" https://127.0.0.1:6443/api/v1/nodes/10.0.2.15
Any suggestions would be greatly appreciated. Thank you.
The error Get "https://127.0.0.1:6443/api/v1/nodes/10.0.2.15": dial tcp 127.0.0.1:6443: connect: connection refused, which typically occurs when the kubelet is unavailable.
However, all commands in my environment are functioning properly, and I can create pods without datasets successfully.
@whygyc it seems like the same problem with #3417
You can get more information and solution from my issue comment here https://github.com/fluid-cloudnative/fluid/issues/3417#issuecomment-1691532950
Thank you for your response. After investigating in the past few days, I found that the issue lies with csi-nodeplugin-fluid (the same error logs can be seen through kubectl logs -n fluid-system csi-nodeplugin-fluid-wr74l -c plugins). Within the plugins container, it is not possible to directly access the IP address 127.0.0.1 because it belongs to the internal network space of the container.
To resolve this, it is necessary to modify the DaemonSet/csi-nodeplugin-fluid by adding hostNetwork: true and adjusting the port numbers for two listeners.
After making these changes, it will be able to correctly access 127.0.0.1 on the host, and pod creation will be successful.
# kubectl edit DaemonSet/csi-nodeplugin-fluid -n fluid-system
......
- --pprof-addr=:6061
- --metrics-addr=:8081
......
hostNetwork: true
......
~~Regarding hostNetwork: true for DaemonSet/csi-nodeplugin-fluid, I noticed that it seems to be configured in the YAML. However, I am not familiar with Helm, so I am unsure how to modify this configuration during Helm installation.~~
https://github.com/fluid-cloudnative/fluid/blob/05635698c0a0f8c3381a284240b56bcf0694f9d9/charts/fluid/fluid/values.yaml#L35
Thank you for the reminder. I did not notice the helm upgrade fluid --set csi.config.hostNetwork=true fluid/fluid mentioned in the documentation. You are correct, but it seems that the documentation does not include the section on modifying ports. In my environment, using hostNetwork directly will lead to port conflicts. This is also an issue.