Wojciech Tyczynski
Wojciech Tyczynski
/remove-lifecycle stale
I was wondering about why the actual graphs in perf-dash are counter-intuitive and I filed #2006 for that. This explains why the phases that involve schedule time seem much better...
The finding from the experiment of increased scheduling throughput: - pretty much nothing changed in terms of metrics - the reason seem to be: https://github.com/kubernetes/kubernetes/issues/108606
With https://github.com/kubernetes/kubernetes/pull/108648 merged, we have mitigated the problem described above. The outcome is that our pod-startup latency drastically dropped as visible on the graph:  Now we're...
> where can I find at the scheduler logs? @ahg-g - you click at "Master and node logs" towards bottom of the page and then on the gcs link and...
I looked a bit more into and all long-starting pods I've seen were due to the fact there were a bunch of pods starting on a node at the same...
I played a bit more with this in https://github.com/kubernetes/kubernetes/pull/109067 In particular this run of 5k presubmit: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/109067/pull-kubernetes-e2e-gce-scale-performance-manual/1517436913894559744 Let's focus purely on the `e2e-a6cf091dfb-bf35d-minion-group-1-75b5` node from that run. For completeness here...
OK - so the assumption above is incorrect - allocatable from the e2-medium node: ``` I0425 13:29:01.356114 14825 cluster.go:86] Name: e2e-2053-62db2-minion-group-5rzq, clusterIP: 10.40.0.24, externalIP: , isSchedulable: true, allocatable: v1.ResourceList{"attachable-volumes-gce-pd":resource.Quantity{i:resource.int64Amount{value:15, scale:0},...
So continuing the above: - the e2-medium machines we're using from high-level perspective have (1cpu, ~4GB of RAM) - the reason why less memory is than 4GB is this setting:...
OK - so the above moved us a long way (in fact got down to 5s for 100th percentile based on last two runs of 5k-node test). That said, we're...