[Improve][Client] Group writing into the channel in PerChannelBookieClient

Open hangc0276 opened this issue 2 years ago • 1 comments

Motivation

When the BookKeeper client writes an entry to the BookKeeper server, it runs with the following steps:

Step 1: Initiate a PendingAddOp object.
Step 2: For each replica, select a bookie client channel according to the ledgerId
Step 3: Write the entry to the bookie client channel, and flush it.
Step 4: The entry was added to Netty's pending queue, and processed with the configured Netty pipeline, such as bookieProtoEncoder, lengthbasedframedecoder, and consolidation
Step 5: Waiting for the written response

If the bookie client writes small entries with high ops and the Netty's pending queue will be full and the Netty thread will be busy with processing entries and flushing them into the socket channel. The CPU will switch between the user mode and the kernel mode in high frequency.

#3383 introduced Netty channel flushes consolidation to mitigate syscall overhead. But it can not reduce the overhead on the Netty threads.

We can tune it one Step 3 to group the small entries into one ByteBuf and flush it into the Netty pending queue when conditions are met.

Design

When a new entry comes to the bookie client channel, we add it into one ByteBuf and check whether the ByteBuf exceeds the max threshold, the default is 1MB.

In order to avoid entry staying in the Bookie client channel ByteBuf for a long time causing high write latency, we schedule a timer task to flush the ByteBuf every 1 ms.

Performance

We test the write performance on my laptop with the following command.

bin/benchmark writes -ensemble 1 -quorum 1 -ackQuorum 1 -ledgers 100 -throttle 300000 -entrysize 60 -useV2 -warmupMessages 1000000

The performance result.

Writer ledgers	batched write ops/s	non-batched write ops/s	improved
1	333238	335970	0%
50	261605	153011	71%
100	260650	126331	100%
500	265628	164393	62%

Mar 26 '23 04:03 hangc0276

I think this should be configurable. It improves throughput for some workloads, it is possible that more latency-sensitive workloads would want to disable this

Mar 29 '23 22:03 dlg99