distributed-load-testing-on-aws icon indicating copy to clipboard operation
distributed-load-testing-on-aws copied to clipboard

Result Widget has confusing number of virtual users

Open mcawilcox opened this issue 1 year ago • 6 comments

Describe the bug After a test run, the results include an image of the main test parameters from a cloudwatch widget (for me, related to just region eu-west-2). In a calibration run, setting the concurrency to 10, I expect to see this in the metrics as a nice steady line of 10, preceeded by a steady ramp. Instead I get a line that jumps around, but is of the order of 200.

The logfile from Taurus consistently logs "10 vu" after the initial ramp up, but the logging interval varies from 5s down to 2s.

If I examine cloudwatch directly, I can simulate the view presented amongst the results, when the "virtual users" statistic is set to Sum. I can get the correct graph by changing the statistic to "Average", "Minimum" or "Maximum".

"Sum" is the wrong statistic to use for VU, as there are multiple samples per minute. It is correct for the "Successes" and "Failures" counts.

However, once corrected, the "Virtual Users" count does not have the right size/scale to be properly visible using the right-hand y-axis (much smaller value than "Successes"; I suggest that the result be made into two graphs - but I'm not sure if cloudwatch can generate a single widget in this manner.

I suggest a fix around line 403 in results-parser/lib/parser/index.js from:

key !== "avgRt" && (metricOptions[key].stat = "Sum");

to:

key !== "avgRt" && key !== "numVu" && (metricOptions[key].stat = "Sum");

but I'm not able to test, and I'm not sure of the impact to the other image, which brings me to the final point...

I am testing with a single region, and can only see the results image for that region. I can see that DLT has generated a "total" image as well, and I can see that the code changes some of the metric calculations ... but I can't get the DLT web GUI to display that "total" image.

To Reproduce

  1. Deploy DLT using CloudFormation
  2. Configure a test with a task count of 1, concurrency of 10, region of eu-west-2, ramp of 5, hold of 90, and a task type of Jmeter.
  3. Upload Jmeter script where there is a single thread group, with number of threads 1, ramp up 1 and loop count 1. My test script happens to generate 23,000 requests in the 95 minutes (240 requests/min)
  4. Let test run
  5. Observe the Test Result page, especially the image on the lower-right of the panel.

Expected behavior

Please complete the following information about the solution:

  • [x] Version: v3.2.10
  • [x] Region: eu-west-2
  • [x] Was the solution modified from the version published on this repository? No
  • [x] If the answer to the previous question was yes, are the changes available on GitHub? n/a
  • [x] Have you checked your service quotas for the services this solution uses? No, but only running single calibration tests currently
  • [x] Were there any errors in the CloudWatch Logs? No

Screenshots

  1. Original widget image, with virtual users in blue 01-TestResult-BadVU
  2. The result-parser lambda logs a widget description into the logfiles, so I used this to create a cloudwatch widget. There is nothing visible because it sets the period to 10s 02-CW-Metrics-ReplicatedFromResultLambda
  3. So I changed the period to 1 minute. Note the statistic for "Virtual Users" is set to Sum, and this graph matches the original. 03-CW-Metrics-ReplicatedNowVisible
  4. Sum is bad to use when the samples aren't once per minute. This shows the number of samples 04-CW-Metrics-WithSampleCount
  5. Here, I fixed the statistic, but it is now hard to see as it is the wrong scale for the y-axis 05-CW-Metrics-FixedButHardToSee
  6. In this graph, I make the "Virtual Users" value more visible by multiplying by 10, but that value depends on the details of the test case. 06-CW-Metrics-FixedAndVisible
  7. Better would be to display the Users on a third y-axis, or like this as a separate graph with y-axis labelled for users 07-CW-Metrics-FixedGraph-Users
  8. and a graph with y-axis labelled for Requests per Minute 08-CW-Metrics-FixedGraph-Hits

Additional context

mcawilcox avatar Aug 18 '24 12:08 mcawilcox

I've added a snippet from the Taurus logs: 09-TaurusLogSnippet

mcawilcox avatar Aug 18 '24 12:08 mcawilcox

I found an example in Taurus of them separating the two result graphs for Hits and Response Times Taurus Reporting Example

I've made some changes to my widgets to emulate these two graphs:

  1. I've used a stacked area graph for this one: Taurus-LoadGraph
  2. I've used a line graph for this: Taurus-ResponseTimeGraph

mcawilcox avatar Aug 18 '24 13:08 mcawilcox

  1. Inspired by that page, can I suggest an enhancement? Sometimes showing the average latency is good, but sometimes seeing p90 would be good: Taurus-ResponseTimeWithp90
  2. Or p95: Taurus-ResponseTimeWithp95
  3. If the enhancement can't be done on the static Test Results page, can a customisable widget be left in Cloudwatch (perhaps part of the dashboard) that would allow some of these extra lines to be graphed, allowing us to generate our own images for test reports?

mcawilcox avatar Aug 18 '24 13:08 mcawilcox

Addition: I realised the Cloudwatch live dashboard has the same underlying issue - it performs sum(@numVu) too - but this one mostly works because the full log insights parser is "stat sum(@numVu) by bin(1s)" ... and most of the time the bin(1s) ensures only a single sample matches, so sum()==avg().

I do see occasional glitches where the graph doubles ... so sometimes there are 2 samples per second. 10-CW-Live-UsersGlitch

Again, using avg(), min() or max() works.

mcawilcox avatar Aug 18 '24 14:08 mcawilcox

This is very comprehensive. Thanks for the analysis. I will go over it and update you on this.

kamyarz-aws avatar Aug 19 '24 13:08 kamyarz-aws

Addition: I did all my original analysis using a single load engine, which meant that use of the "Average" statistic worked out well. Since then, I started to scale my tests beyond a single load engine, and realised that the "Average" statistic no longer works - there needs to be something that knows how many engines are running in parallel.

As a quick hack in my own metrics, I added a line for "engines" as "TIME_SERIES(4)" when I have 4 tasks, and then defined the Virtual Users to be "AVG([numVu0]) * engines"

mcawilcox avatar Sep 18 '24 17:09 mcawilcox

Hi @mcawilcox

Thank you for providing additional information for issue, we are currently reviewing the metrics being generated by the solution and have added the data provided to our backlog for consideration.

gockle avatar Mar 19 '25 14:03 gockle

Hi @mcawilcox,

We wanted to update you regarding the metrics dashboard. We made a change to relabel 'Virtual Users' to 'Accumulated Virtual Users Activities' for improved clarity. We will be updating the implementation guide to include detailed explanations about this modification.

In parallel, our team is actively working on a comprehensive revision of the metrics dashboard.

While we are closing this support ticket, please be assured that we are internally tracking this enhancement until the revised dashboard is fully implemented and released in an upcoming solution update.

Thank you for your understanding.

gockle avatar May 05 '25 19:05 gockle