spiderpool icon indicating copy to clipboard operation
spiderpool copied to clipboard

improve the performance for rdma metrics

Open weizhoublue opened this issue 10 months ago • 0 comments

Spiderpool Version

1.0

Main CNI

No response

bug description

(1) the range of network namespace

https://github.com/spidernet-io/spiderpool/blob/55c226a0778936429a73a72c0792e38d24ad70db/pkg/rdmametrics/metrics.go#L293

there are only one or two Pod with RDMA device running on the node at the meaning time. Current code will loop each net ns with "rdma" cli call. If some net ns without rdma device could be recognized and filtered, it could reduce the "rdma" cli call and increase the sampling rate

~# time rdma statistic -j >/dev/null

real	0m0.011s
user	0m0.001s
sys	0m0.006s

the component could watch local rdma pod and use cgroup to relate PodName with Pid , and furthermore find its network namespace

(2) how to get the net ns ? use the container id or pod uuid

container id = 4a4d1c825878264e18edf3104eab078b61bfb7c23edfbfb2eb9cabe43cf61284

pod uuid = 49737361_5190_4156_b90b_b0a9b3ea2bce


root@10-20-1-50:~#
root@10-20-1-50:~#
root@10-20-1-50:~# kubectl get pod -A | grep dns
kube-system       coredns-7db6d8ff4d-b48cr                                      1/1     Running            3 (59d ago)      177d
kube-system       coredns-7db6d8ff4d-lmhs7                                      1/1     Running            4 (59d ago)      177d
root@10-20-1-50:~#
root@10-20-1-50:~#
root@10-20-1-50:~# kubectl get pod -n kube-system       coredns-7db6d8ff4d-b48cr -o yaml
apiVersion: v1
kind: Pod
metadata:
  uid: 49737361-5190-4156-b90b-b0a9b3ea2bce
  annotations:
    cni.projectcalico.org/containerID: 4a4d1c825878264e18edf3104eab078b61bfb7c23edfbfb2eb9cabe43cf61284
    cni.projectcalico.org/podIP: 172.30.20.196/32
    cni.projectcalico.org/podIPs: 172.30.20.196/32
    k8s.v1.cni.cncf.io/network-status: |-



root@10-20-1-50:~# cat /proc/15360/cgroup
0::/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod49737361_5190_4156_b90b_b0a9b3ea2bce.slice/cri-containerd-4a4d1c825878264e18edf3104eab078b61bfb7c23edfbfb2eb9cabe43cf61284.scope



// getPodAndContainerID 从给定的 cgroup 路径中提取 Pod ID 和 Container ID。
//
// 工作原理:
// 1. 打开并读取 cgroup 文件。
// 2. 使用正则表达式查找包含 "kubepods" 的行。
// 3. 解析该行以提取 Pod ID 和 Container ID。
// 4. Pod ID 通常在第四个路径段中,Container ID 在第五个路径段中。
// 5. 使用正��表达式匹配以适应不同的 cgroup 路径格式。
// 6. 将 Pod ID 中的下划线替换为连字符,以匹配 Kubernetes 中的 UID 格式。
//
// 参数:
//   - cgroupPath: cgroup 文件的路径,通常为 "/proc/<PID>/cgroup"
//
// 返回值:
//   - string: Pod ID(如果找到)
//   - string: Container ID(如果找到)
//   - bool: 是否为主机进程(如果找到)
//   - 如果未找到,两个返回值都为空字符串
func getPodAndContainerID(cgroupPath string) (string, string, bool) {
	file, err := os.Open(cgroupPath)
	if err != nil {
		fmt.Printf("打开 cgroup 文件时出错:%v\n", err)
		return "", "", false
	}
	defer file.Close()

	podRegex := regexp.MustCompile(`kubepods-[^-]+-pod([^.]+)\.slice`)
	containerRegex := regexp.MustCompile(`[^-]+-([^.]+)\.scope`)
	dockerContainerRegex := regexp.MustCompile(`docker-([0-9a-f]{64})\.scope$`)
	containerdContainerRegex := regexp.MustCompile(`containerd-([0-9a-f]{64})\.scope$`)
	crioContainerRegex := regexp.MustCompile(`crio-([0-9a-f]{64})\.scope$`)

	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.Contains(line, "kubepods") {
			// 现有的 Kubernetes Pod 逻辑
			parts := strings.Split(line, "/")
			if len(parts) >= 4 {
				podMatch := podRegex.FindStringSubmatch(parts[3])
				if len(podMatch) == 2 {
					podID := strings.ReplaceAll(podMatch[1], "_", "-")

					if len(parts) >= 5 {
						containerMatch := containerRegex.FindStringSubmatch(parts[4])
						if len(containerMatch) == 2 {
							return podID, containerMatch[1], false
						}
					}
				}
			}
		} else if dockerMatch := dockerContainerRegex.FindStringSubmatch(line); dockerMatch != nil {
			return "", dockerMatch[1], false
		} else if containerdMatch := containerdContainerRegex.FindStringSubmatch(line); containerdMatch != nil {
			return "", containerdMatch[1], false
		} else if crioMatch := crioContainerRegex.FindStringSubmatch(line); crioMatch != nil {
			return "", crioMatch[1], false
		} else if isHostProcess(line) {
			return "", "", true
		}
	}

	return "", "", false
}

What did you expect to happen?

No response

How to reproduce it (as minimally and precisely as possible)

No response

Additional Context

No response

weizhoublue avatar May 22 '25 03:05 weizhoublue