perf-tests Expose more information in clusterloader2 logs

There are a couple things that we definitely need:

[x] more state about pods from a given controlling object (number of pending, waiting, checking if something was deleted, etc.). Mostly copying this logic: https://github.com/kubernetes/kubernetes/blob/master/test/utils/runners.go#L803
[x] pod-startup-time latency should output thing that is somewhat similar to what we currently do (for debugging purposes)
[x] show more clearly where a given test finished:

W1112 13:18:48.029] I1112 13:18:48.029322    9960 clusterloader.go:127] Test testing/density/config.yaml ran successfully!"

is not very visible in those logs

[x] We are currently printing about the information about nodes that is extremely helpful for debugging (this is currently part of density). It would be useful to add that too (it should probably be part of initialization of cluster loader)
[ ] You need to audit logs in measurements - a bunch of glog`s should actually be real failures and fail the test at the end (though not immediately). I can imagine this as something like: https://github.com/kubernetes/kubernetes/issues/66239#issuecomment-405255089, but also as a special measurement that inside is collecting errors (and gloging them when they happen) and at the end fails if any logs were reported (should be simpler than a separate flakes.txt file).

I guess there may be more, but let's start with those.

/assign @krzysied

Nov 12 '18 14:11 wojtek-t

@kubernetes/sig-scalability-bugs

Nov 12 '18 14:11 wojtek-t

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Feb 10 '19 15:02 fejta-bot

/remove-lifecycle stale

@krzysied - what's the status of this?

Feb 11 '19 07:02 wojtek-t

@wojtek-t First 4 points are done. The last one is partially done. There is no errors that immediately fail test, however there is no flake.txt file.

Feb 11 '19 09:02 krzysied

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

May 15 '19 20:05 fejta-bot

/remove-lifecycle stale /lifecycle frozen

May 16 '19 06:05 wojtek-t

perf-tests perf-tests copied to clipboard

Expose more information in clusterloader2 logs

perf-tests
perf-tests copied to clipboard