crash-diagnostics
crash-diagnostics copied to clipboard
etcd crash logs
There are two really useful text snpipets we can collect about etcd over ssh
Note these have to happen on CAPI Masters
1) results of etcd perf, making sure disk is fast
etcdctl=`find / -name etcdctl` # kinda hacky
etcdctl --endpoints="https://localhost:2379" --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" check perf
Result:
root [ /home/capv ]# etcdctl --endpoints="https://localhost:2379" --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" check perf
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 150 writes/s
PASS: Slowest request took 0.276541s
PASS: Stddev is 0.013858s
PASS
2) Getting FSYnc WALs:
curl localhost:2381/metrics | grep fsync
Result:
# TYPE etcd_disk_wal_fsync_duration_seconds histogram
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 202
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.008"} 1601
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.016"} 2173
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.032"} 2552
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.064"} 2635
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.128"} 2658
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.256"} 2669
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.512"} 2674 <-- 5 writes took a half second1
etcd_disk_wal_fsync_duration_seconds_bucket{le="1.024"} 2676
etcd_disk_wal_fsync_duration_seconds_bucket{le="2.048"} 2676
etcd_disk_wal_fsync_duration_seconds_bucket{le="4.096"} 2676
etcd_disk_wal_fsync_duration_seconds_bucket{le="8.192"} 2676
etcd_disk_wal_fsync_duration_seconds_bucket{le="+Inf"} 2676
etcd_disk_wal_fsync_duration_seconds_sum 33.182538455000035
etcd_disk_wal_fsync_duration_seconds_count 2676
3) Getting Cloud init logs
root [ /home/capv ]# cat /var/log/cloud-init-output.log | grep -i timeout [kubelet-check] Initial timeout of 40s passed.
@jayunit100 closing as this is Diagnostics file specific.