grey
grey
问题描述:当一条告警8:00触发,fire_At时间为8:00,存在了doraemon的数据库中,该条告警持续告警中,然后8.05从promethues查出的这条告警信息的fire_At时间与8:00不一致,为8:05分,这时候doraemon处理这条告警逻辑就会有问题。 doraemon处理逻辑是先查,fire_At时间对不上,没查出来该告警,便新增了一条告警,以至于老的告警变成了脏数据,没有人维护,一直在发送。 修复: 插入告警前,代码增加了删除老的告警
we want pod-0 primary on node1,secondary on node 02
``` record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate expr: sum by(cluster, namespace, pod, container) (rate(container_cpu_usage_seconds_total{container!="POD",image!="",job="kubelet",metrics_path="/metrics/cadvisor"}[5m])) * on(cluster, namespace, pod) group_left(node) topk by(cluster, namespace, pod) (1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})) ``` this record rule...
Message: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/lxcfs/proc/cpuinfo\\\" to rootfs \\\"/var/lib/docker/overlay2/c8de89d9c388253c25184e2dccc283ee76762db1f65ddef9ec2b2d702b5cb9f3/merged\\\" at \\\"/var/lib/docker/overlay2/c8de89d9c388253c25184e2dccc283ee76762db1f65ddef9ec2b2d702b5cb9f3/merged/proc/cpuinfo\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying...