serving icon indicating copy to clipboard operation
serving copied to clipboard

Visualization of performance tests results

Open nader-ziada opened this issue 3 years ago • 9 comments

/area test-and-release

Describe the feature

Background:

  • The old Knative performance testing framework was based on Mako, which is not open to accept data from non-Google developers, making it not possible for community members to run benchmark tests.
  • This got replaced with some perf tests running on kperf and the old suite using Mako running as well but using a side-car stub to collect the data instead of sending it to Mako backend.
  • These two methods collect data in files, but do not provide a graphical representation of the data.

The feature request now is to have a graphical representation on the continuously collected data

  • [ ] Have a central location the prow test jobs can send the data to.
  • [ ] Run nightly jobs to collect the results.
  • [ ] Build a dashboard to show performance results over time.
  • [ ] Document the performance tests and what measurements are captured in each test

nader-ziada avatar Jun 07 '22 18:06 nader-ziada

/assign

nader-ziada avatar Jun 07 '22 18:06 nader-ziada

The approach I'm planning currently is add the data points in an influxdb instance and then have a grafana dashboard showing the results

This is an example of how the dashboard could look like, still working on a POC locally example-dashboard

If folks have comments or feedback about the approach or the selected tools, please comment on the issue here

nader-ziada avatar Jun 22 '22 19:06 nader-ziada

Where's the db / dashboard going to live?

psschwei avatar Jun 24 '22 20:06 psschwei

Where's the db / dashboard going to live?

somewhere in knative gcp project I guess, I need to figure this out with productivity wg

nader-ziada avatar Jun 27 '22 13:06 nader-ziada

@nader-ziada how is it going? I think we could request a server at CNCF cluster and host the performance dashboard there.

And I think the dashboard could be a part of kperf project. Maybe we can create an issue at kperf and track the progress there.

daisy-ycguo avatar Aug 11 '22 08:08 daisy-ycguo

There is a community cluster in the knative project that is meant to host the influxdb and dashboard, currently working on it here https://github.com/knative/serving/pull/13192

nader-ziada avatar Aug 11 '22 17:08 nader-ziada

http://34.170.87.98:443/d/HFw0qn74k/serving-performance-testing?from=1663269760987.5151&to=1663323727899.558&orgId=1

nader-ziada avatar Sep 19 '22 16:09 nader-ziada

asks for a username/password

psschwei avatar Sep 19 '22 19:09 psschwei

asks for a username/password

knative/knative not sure how to make it public

nader-ziada avatar Sep 19 '22 19:09 nader-ziada

You can setup a domain name ie. perf.knative.dev and point it to your dashboard

https://github.com/knative/test-infra/blob/main/infra/gcp/dns/dns.tf

dprotaso avatar Oct 19 '22 15:10 dprotaso

You can setup a domain name ie. perf.knative.dev and point it to your dashboard

https://github.com/knative/test-infra/blob/main/infra/gcp/dns/dns.tf

https://github.com/knative/test-infra/pull/3584

nader-ziada avatar Oct 20 '22 15:10 nader-ziada

Playing with the dashboard - it's a bit cumbersome.

Questions/thoughts

  1. If I run the performance test on a PR how do I visualize the metrics for just that single run?
  2. We need to label the axes and have better names for the lines in the graph
    • see ie. https://mako.dev/benchmark?benchmark_key=5143375149793280
    • I'm assuming there a way to just configure the dashboard to re-label the series
  3. Is there a way to toggle on/off a specific series. For example in the above mako link just deployment latency.
  4. Graphs load slow - not sure what to do here - maybe split up the dashboard so it's not loading data for four graphs
  5. Is there a way to filter data based on tags in the grafana UI - ie. if we start doing perf tests on release branches and different k8s versions how would I do that.

dprotaso avatar Oct 21 '22 16:10 dprotaso

thanks for the feedback, will investigate how to do these changes on the dashboard

nader-ziada avatar Oct 21 '22 16:10 nader-ziada

Also wondering if there's a way to drop the interpolation between the successive runs - it seems a bit noisy

Image

dprotaso avatar Oct 21 '22 16:10 dprotaso

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jan 20 '23 01:01 github-actions[bot]