document in driver-redpanda producer workload tunings for high volume producer configs.
https://redpandadata.slack.com/archives/C01ND4SVB6Z/p1694729911166379
Need to document some additional OMB workload configuration detail for high volume testing.
There's a few items in this thread talking about how to get the producer to keep up with the expected rates
- Possible quirks with key distributor not keeping up with the expected rate. Random Nano seems to act weird in high rate setups across many producers and partition spreads. NoKey and Round Robin seem to keep up. this could be related to next issue
- for high volume produce rates (>1million/s) across many producers (tens) going to many partitions (thousands), java client may also need to have buffer.memory significantly increased to handle the amount of data being generated in the batch.
In example test, @travisdowns calculated that for 1.8m messages/sec on 10 partitions with thousands of partitions each coming from ~100 producers, the buffer size needed was likely 3-4x larger than what we were setting in the test (around 32-33MB).
2300 partitions per topic * 32000 batch size = 73.6 MB
according to java client docs, when doing larger batch sizes
A very large batch size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records.
so we may not have been able to fill the batch due to buffer limits in the original tests.