John Watson comments

Results 10 comments of


                                            John Watson

Regression on 0.6.3: could not find schema due to Draft 4 being used

Here's the VPA CRD that will validate with kubeconform v0.6.3+ : https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml

Statefulset OnDelete Limitation and Enhancement

`storage.vminsertConnsShutdownDuration` has no effect (when longer than 25s) when using `OnDelete` strategy because the operator has a hard-coded 30s timeout: https://github.com/VictoriaMetrics/operator/blob/e261c37dc973154ff71073d7213000421416bd4e/controllers/factory/k8stools/sts.go#L194 Instead, the deletion grace period should probably use `TerminationGracePeriodSeconds`...

Cilium issues on EKS with NLB in ip-mode and preserve-client-ip enabled

To trigger this issue, the Pod needs to be allocated an IP that's _not_ on the Primary ENI SYN will come in on the secondary ENI, but SYN-ACK will go...

Cilium issues on EKS with NLB in ip-mode and preserve-client-ip enabled

> Tho I am not sure if this is not working because of my cilium config or is this a cilium issue that needs to be fixed? masquerading needs to...

Thanos Receive (RouteOnly mode) Panic

Lots of: `fatal error: traceback did not unwind completely` But one time did get more: ``` fatal error: slice bounds out of range fatal error: index out of range panic...

Thanos Receive (RouteOnly mode) Panic

Per the request in https://github.com/golang/go/issues/64781 I added `GODEBUG="gccheckmark=1,gcshrinkstackoff=1,asyncpreemptoff=1"` and we have not had a panic in >24h. We used to see at least a couple per hour.

Thanos Receive (RouteOnly mode) Panic

Ran each `GODEBUG` flag separately, seems only `GODEBUG=gcshrinkstackoff=1` is needed to prevent panics for now

receive: wasting CPU on GC

``` receive --log.level=info --log.format=logfmt --grpc-address=0.0.0.0:10901 --http-address=0.0.0.0:10902 --remote-write.address=0.0.0.0:19291 --objstore.config=$(OBJSTORE_CONFIG) --tsdb.path=/var/thanos/receive --label=thanos_receive_replica="$(NAME)" --label=receive="true" --tsdb.retention=26h --receive.local-endpoint=$(NAME).thanos-receive-headless.$(NAMESPACE).svc.cluster.local.:10901 --grpc-server-tls-cert=/cert/tls.crt --grpc-server-tls-key=/cert/tls.key --grpc-server-tls-client-ca=/cert/ca.crt --label=metrics_namespace="global" --receive.tenant-label-name=cluster --receive.default-tenant-id=unknown --receive.hashrings-file-refresh-interval=1m --remote-write.server-tls-cert=/cert/tls.crt --remote-write.server-tls-client-ca=/cert/ca.crt --remote-write.server-tls-key=/cert/tls.key --tsdb.memory-snapshot-on-shutdown --tsdb.max-block-duration=1h --tsdb.min-block-duration=1h --writer.intern ``` We're running...

receive: wasting CPU on GC

> Is the problematic one the router or the ingester? Ingester

ZNC crashes with multiple keyxs too quickly, or a keyx w/ a timeout and another keyx

Any luck with your debugging?