camunda-8-benchmark icon indicating copy to clipboard operation
camunda-8-benchmark copied to clipboard

PI completed metric no longer updated

Open gabortega opened this issue 2 years ago • 7 comments

Hello,

Whilst trying to use the benchmarking tool, we noticed that the pi_completed metric was never updated. This results in no data being displayed on Grafana for both this metric and pi_cycletime.

Doing some debugging, I found the culprit to be this code section in JobWorker.java:

private void registerWorker(String jobType) {
        long fixedBackOffDelay = config.getFixedBackOffDelay();

        JobWorkerBuilderStep1.JobWorkerBuilderStep3 step3 = client.newWorker()
                .jobType(jobType)
                .handler(new SimpleDelayCompletionHandler(false));

        if(fixedBackOffDelay > 0) {
            step3.backoffSupplier(new FixedBackoffSupplier(fixedBackOffDelay));
        }

        step3.open();
    }

where the new SimpleDelayCompletionHandler(boolean) is always called with the value false and thus, these workers never report any completed PIs:

        // worker marking completion of process instance via "task-type-complete"
        registerWorker(taskType + "-completed");

        // worker marking completion of process instance via "task-type-complete"
        registerWorker(taskType + "-" + config.getStarterId() + "-completed");

Is this intentional? If so, the Grafana dashboard may need to be updated as multiple graphs are showing up with no data.

gabortega avatar Sep 27 '22 09:09 gabortega

Any luck on this @gabortega ? On the above code snippet for SimpleDelayCompletionHandler constructor, I in-fact tried passing the flag as true and running the benchmark tool but I don't see any change. The grafana dashboard is still empty and the completed jobs and processes counters are still 0.

Probably we are missing something very basic. I am surprised that the tool doesn't work out-of-the-box. I have just cloned this report and modified the application.properties to connect with the local zeebe cluster.

Thanks

shahamit avatar Nov 11 '22 13:11 shahamit

Hello @shahamit,

We eventually ended up making a fork of this tool and changed much of the code to fit our own needs.

I don't currently have access to the fix I did before our fork so this is based off what I remember:

For the original fix, I changed the signature of registerWorker(String jobType) to registerWorker(String jobType, boolean flag) and set the flag to true for those two workers and false to all the others.

We also observed that only the default backpressure strategy would produce the required metrics. Since we wanted to test Zeebe with a fixed throughput and have these metrics, we set all modifiers (i.e., benchmark.startPiReduceFactor and benchmark.startPiIncreaseFactor ) to 0 and we did not have to change benchmark.maxBackpressurePercentage (I think). We could roughly set our desired throughput using benchmark.startPiPerSecond, though we didn't necessarily obtain the exact number set to the property.

gabortega avatar Nov 11 '22 13:11 gabortega

Thanks for your inputs @gabortega. While trying to troubleshoot if I check out to an old revision '08dc3ba3' I do see completed jobs counter incremented. Still no data in grafana. Probably the library upgrades done by the bot screwed up the application.

shahamit avatar Nov 11 '22 13:11 shahamit

Sorry folks - no time too look into this right now - but happy to accept a PR if you find out the root case. Happy to be pinged again next month and hope to have more availability then :-|

berndruecker avatar Nov 14 '22 11:11 berndruecker

As a workaround you could look a the Zeebe Grafana Dashboard

falko avatar Feb 27 '23 08:02 falko

@falko - Zeebe grafana dashboard has a limitation - It cannot report cycle time for process instances that execute for more than 10 secs. The benchmarking tool dashboard can report it but it probably isn't compatible to report metrics for k8s deployments.

shahamit avatar Mar 24 '23 11:03 shahamit

I am adding this here, because the observation fits into the picture. Using the latest image on Kubernetes, I also found that the metrics pi_cycletime and pi_completed were never updated. This didn't change after building the image myself. Also the following lines in the pod's log were completely missing:

PI STARTED:     1022178 (+  1680) Last minute rate:  27.8
  Backpressure: 171815 (+   138) Last minute rate:   1.9. Percentage: 6.789 %
PI COMPLETED:   914193 (+  1150) Last minute rate:  20.0. Mean: 126,707. Percentile .95: 132,827. Percentile .99: 143,085

The reason is, that the StatisticsCollector (and probably other classes as well) are not properly initialized. During startup, there are a lot of messages like this:

17:46:41.441 [main] INFO  i.c.z.s.c.a.MicrometerMetricsRecorder - Enabling Micrometer based metrics for spring-zeebe (available via Actuator)
17:46:41.441 [main] INFO  o.s.c.s.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'micrometerMetricsRecorder' of type [io.camunda.zeebe.spring.client.actuator.MicrometerMetricsRecorder] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)

The StatisticsCollector was also amongst of them.

I understood that this is related to a cyclic dependency with @Autowire and class initialization during application startup, but for someone who is not into all this Spring and Spring-Boot stuff, the interdependencies are totally intransparent and I don't have a clue, how to fix it for me.

I eventually got the benchmark running by checking out a commit from 31 Mar 2022 (before all these spring stuff updates) and building the image from there. I'd really appreciate a fix for the latest version :-)

befer avatar Apr 08 '23 12:04 befer