KNet icon indicating copy to clipboard operation
KNet copied to clipboard

Add a benchmark to verify performance

Open mariomastrodicasa opened this issue 3 years ago • 8 comments
trafficstars

Is your feature request related to a problem? Please describe. The project needs a benchmark to verify its performance.

Describe the solution you'd like Add some benchmark associated to the produce/consume API calls to verify performance and identify possible bottlenecks.

Describe alternatives you've considered Introduce some counter or other mechanism to measure time like Stopwatch.

Additional context N/A

mariomastrodicasa avatar Apr 08 '22 15:04 mariomastrodicasa

The benchmark cannot be built with an absolute vision. What it can done is to have some function that executes in the same way, but use a different underlying mechanism: i.e. use KNet in some function and use e.g. Confluent.Kafka in other. Pairing this two executions it is possible to compare both implementations. Anyway it is mandatory to use configurations which, more or less, create similar environments: if the configuration is in one case, or in another, is optimized the comparison has no real meaning. The previous consideration comes from the number of possible options can be used when a consumer/producer is allocated, how the underlying mechanism manages message batching both in receive mode (consume) and send mode (produce).

masesdevelopers avatar Apr 13 '22 16:04 masesdevelopers

Maybe a new option shall be added, or a value shall be calculated, about the Average, stdev and CV without the maximum and minimum values in the series. Max and Min in a series can be affected from spurius conditions and can affect the other measurements: try to remove them and calculate the values.

mariomastrodicasa avatar May 02 '22 16:05 mariomastrodicasa

Try to integrate BenchmarkDotNet (at https://github.com/dotnet/BenchmarkDotNet)

mariomastrodicasa avatar May 13 '22 13:05 mariomastrodicasa

@masesdevelopers: Add a new benchmark which measures the time elapsed from the moment where a record is sent from the producer to the reception of the same record from a consumer: it is like a roundtrip. I think this benchmark can measure both producer and consumer performance, maybe it is possible to use https://github.com/masesgroup/KNet/issues/53#issuecomment-1126036921

mariomastrodicasa avatar Mar 13 '23 17:03 mariomastrodicasa

Reopen to update statistics with latest versions of KNet and Confluent.Kafka

masesdevelopers avatar Jan 26 '24 00:01 masesdevelopers

Many KNet operations can, or cannot, be impacted during JNI operations because data can, or cannot, be the copy of the data available in the JVM.

As stated in https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html JVM can decide to copy, or pin, data depending on its internal implementation.

The data transfer used in KNet is based on JNI and the results of some benchmarks seems impacted from the following hypothesis:

  • JVM makes a copy of the data before send it to JNI and replace back the object data when JNI returns: so there are multiple copies of data during data exchange;
  • JVM Garbage Collector: some JNI methods can impact the GC operations since, depending on JVM implementation, the GC can pin or copy array of primitive types.

Current benchmarks are based on byte array to reduce exogenous interference (like the implementation of the serializers) and in general KNet specific implementation (KNetConsumer, KNetProducer, KNet Streams SDK, etc) uses byte array.

Maybe a better data exchange can be obtained reducing the number of array copies done during execution. An issue will be opened to investigate on this possible evolution, meanwhile this issue is reopened.

masesdevelopers avatar Mar 05 '24 17:03 masesdevelopers

The information traverse the CLR-JVM boundary many times reducing the speed.

The local serializers stub invokes remote serializers: each time there is a conversion the JVM is impacted like in https://github.com/masesgroup/KNet/blob/721fe074212d4e221630853368beaab6a11bf120/src/net/KNet/Specific/Serialization/KNetSerialization.cs#L240 or https://github.com/masesgroup/KNet/blob/721fe074212d4e221630853368beaab6a11bf120/src/net/KNet/Specific/Serialization/KNetSerialization.cs#L347

Each conversion moves data between CLR and JVM:

  • data (bool, int, double, etc) is sent to JVM
  • the JVM returns a converted byte array
  • the byte array is returned to the caller that insert it within e.g. a ProducerRecord
  • ProducerRecord send back again the array to the JVM

The same happens in opposite with ConsumerRecord:

  • JVM receives a ConsumerRecord
  • if one data is needed, e.g. the key, the byte array is requested to the JVM which send it back to the CLR
  • then CLR sent the byte array back again to the JVM
  • finally the JVM returns the converted type (bool, int, double, etc)

masesdevelopers avatar Mar 07 '24 23:03 masesdevelopers

Add a specific workflow to execute benchmarks and reports statistics on the repo

masesdevelopers avatar Jun 25 '24 15:06 masesdevelopers