k6 statsd output shows vus > 1 after simulation exit

Brief summary

When running k6 with "--out statsd", and scraping statsd from prometheus, the vusers typically stay > 0 after the simulation exits. Since Prometheus then keeps scraping the same (last) value, also the Grafana dashboards will show vusers > 0 continuously even when no simulation is running at all. The "last" vus reported by a k6 execution value is sometimes 1, sometimes a higher value, depending on the stages and the number of users configured there. I tried various configurations, e.g. a longer stage with target 0 at the end. I also played around with different values for gracefulStop, but in my understanding that will only make a difference anyway if iterations don't finish by themselves in time. Even if k6 clearly states in the logs that the current number of users is zero at the end, the last pushes to statsd before exiting will report vus > 0 for at least 9 out of 10 k6 executions.

This is an example what the k6 log shows at the end for a simple simulation:

running (05.1s), 0/2 VUs, 1 complete and 0 interrupted iterations getEndpointList ✓ [======================================] 0/2 VUs 5s

followed by the statistics:

 checks.........................: 100.00% ✓ 6        ✗ 0
 data_received..................: 454 kB  89 kB/s
 data_sent......................: 2.5 kB  491 B/s
 http_req_blocked...............: avg=44.66ms  min=0s       med=1µs      max=267.96ms p(90)=133.98ms p(95)=200.97ms
 [...]
 iterations.....................: 1       0.195813/s
 vus............................: 1       min=0      max=1
 vus_max........................: 2       min=2      max=2

I would expect that k6 reports vus=0 here before exiting.

k6 version

v0.38.1

OS

macOS 12.3.1

Docker version and image (if applicable)

No response

Steps to reproduce the problem

Run this sample script and check the k6_vus metric reported to statsd (I tested with a local statsd running in a podman container - hence providing the metrics at http://localhost:9102/metrics):

import http from 'k6/http';
import { ### check, sleep } from 'k6';

export const options = {
    scenarios: {
        testVusers: {
            executor: 'ramping-vus',
            startVUs: 0,
            stages: [
                { duration: '5s', target: 3 },
                { duration: '5s', target: 0 },
                { duration: '5s', target: 3 },
                { duration: '5s', target: 0 },
                { duration: '5s', target: 0 },
            ],
            gracefulRampDown: '0s',
            gracefulStop: '15s'
        }
    }
};

export default function testVusers () {
    const res = http.get('https://test-api.k6.io');
    check(res, { 'status was 200': (r) => r.status == 200 });
    sleep(1);
}

Expected behaviour

k6_vus = 0 (in statsd) when k6 exits - as shown by the statsd metrics endpoint (e.g. http://localhost:9102/metrics):

...
# HELP k6_vus Metric autogenerated by statsd_exporter.
# TYPE k6_vus gauge
k6_vus 0
# HELP k6_vus_max Metric autogenerated by statsd_exporter.
# TYPE k6_vus_max gauge
k6_vus_max 3
...

Actual behaviour

k6_vus > 0 most of the time after k6 has exited:

...
# HELP k6_vus Metric autogenerated by statsd_exporter.
# TYPE k6_vus gauge
k6_vus 1
# HELP k6_vus_max Metric autogenerated by statsd_exporter.
# TYPE k6_vus_max gauge
k6_vus_max 3
...

May 11 '22 13:05 abadelt

HI @abadelt, this seems to be because k6 is not guranteed to emit VUS with a value of zero. This is mostly because the vus metric is emitted once every second, so unless it just so lines up that it's emitted after the execution has ended but before k6 exits it will work.

This likely can be fixed by just emitting one more time before exiting :shrug: , but I don't really think the core team will priorize this highly. I don't remember anyone else reporting this and it seems to only matter in your case and even then I don't think it's that big of an issue, or do you disagree?

You can potentially propose a PR, and we will look into it.

Some other related issues which might resolve it:

https://github.com/grafana/k6/issues/1833
https://github.com/grafana/k6/issues/1977
https://github.com/grafana/k6/issues/1516 - this seems like me reporting this issue but per scenario/executor which has totally different reasons

I see that you tried to work around this with stages, but unfortunately it turns out there is an issue with that as you can see in #2531. That issue as well seems ... low priority to me and touches code that is very sensitive so it's kind of high risk. I have made a small patch that seems to work.

I've also probably missed some strange corner case that might mean that the changes need to be way bigger and more complicated.

My only proposal for some kind of workaroudn this will be to use --linger cli flag and stop k6 manually ... :shrug: This likely is pretty bad. Maybe have a script that use k6 stats every second and when it stops - waits 2-3s and kills it :shrug:.

May 12 '22 11:05 mstoykov

Hi @mstoykov, thank you for the fast and detailed reply - highly appreciated! I guess for what I'm currently trying to achieve, I can as well use the "externally-controlled" executor for ramp-up and -down during execution. That way the reported number of vusers is correct even at the end - just tried it. Alternatively, I will look into your suggestion using "k6 stats" (or rather "k6 status"?) and killing the instance once it stopped.

May 12 '22 18:05 abadelt

Hi @abadelt, I was recently told, by @bboreham, about the stastd_exporter ttl configuration which will likely help you.

You will need to come up with a number, but I would expect that in most cases 5s will be enough.

May 25 '22 08:05 mstoykov

k6 k6 copied to clipboard

statsd output shows vus > 1 after simulation exit

Brief summary

k6 version

OS

Docker version and image (if applicable)

Steps to reproduce the problem

Expected behaviour

Actual behaviour

k6
k6 copied to clipboard