bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

ByteBuf returned by BookKeeper triggers CRC Checksum calculation when calling "readBytes"

Open eolivelli opened this issue 1 year ago • 6 comments

BUG REPORT

Describe the bug I have developting a Pulsar BrokerInterceptor. The BrokerInterceptor is able to process the data that Pulsar read from BookKeeper, without memory copies. While analysing a flamegraph I have seen that the ByteBuf returned by BookKeeper shows this weird behaviour and uses lot of CPU.

This is a flame graph. The version of BookKeeper is based on latest 4.16.x

image

To Reproduce

See the flamegraph

Expected behavior

readBytes has very little overhead

eolivelli avatar May 17 '24 07:05 eolivelli

Interceptor code is here: https://github.com/datastax/pulsar-jms/blob/master/pulsar-jms-filters/src/main/java/com/datastax/oss/pulsar/jms/selectors/JMSPublishFilters.java#L277

eolivelli avatar May 17 '24 07:05 eolivelli

that seems like a strange flamegraph. I don't see how readBytes could trigger CRC calculation.

ByteBufVisitor is used in checksum calculations: https://github.com/apache/bookkeeper/blob/f8eb7a06a683e690e8a70b5bd72927e4df70aa51/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/checksum/DigestManager.java#L69-L71 PR was #4196

lhotari avatar May 17 '24 07:05 lhotari

The ByteBuf was not coming from a read from the BookKeeper client, because the ByteBug is coming from the network (it is the Pulsar producer that is sending a message and the interceptor processes it)

But maybe it is a ByteBuf recycled ?

eolivelli avatar May 17 '24 08:05 eolivelli

The ByteBuf was not coming from a read from the BookKeeper client, because the ByteBug is coming from the network (it is the Pulsar producer that is sending a message and the interceptor processes it)

But maybe it is a ByteBuf recycled ?

It's hard to see how it could result in the stacktrace even if there was a recycling bug. GetBytesCallbackByteBuf instance is not stored as a reference anywhere and gets passed as a parameter here: https://github.com/apache/bookkeeper/blob/f8eb7a06a683e690e8a70b5bd72927e4df70aa51/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufVisitor.java#L138 .

Perhaps it's a profiler issue.

Please add -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints to JVM options to prevent any issues in this area:

  • When agent is not loaded at JVM startup (by using -agentpath option) it is highly recommended to use -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints JVM flags. Without those flags the profiler will still work correctly but results might be less accurate. For example, without -XX:+DebugNonSafepoints there is a high chance that simple inlined methods will not appear in the profile. When the agent is attached at runtime, CompiledMethodLoad JVMTI event enables debug info, but only for methods compiled after attaching.

It might also be useful to compare Async Profiler 2.9 and 3.0 results. Just to be sure that the new stacktrace solution in 3.0 isn't causing the problem.

lhotari avatar May 17 '24 08:05 lhotari

I also found the checksum cost a lot of CPU image

hangc0276 avatar May 29 '24 02:05 hangc0276

I also found the checksum cost a lot of CPU image

@hangc0276 I guess it is expected to consume a lot of CPU? In Enrico's case, the flamegraph doesn't seem to be valid and my assumption was that adding -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints to JVM options would fix it since it's recommended to use these JVM options to get proper results while profiling.

lhotari avatar May 29 '24 06:05 lhotari