pulsar-dotpulsar icon indicating copy to clipboard operation
pulsar-dotpulsar copied to clipboard

Support - Batching when producing

Open blankensteiner opened this issue 5 years ago • 7 comments

As described here:

  • https://pulsar.apache.org/docs/en/concepts-messaging/#batching
  • https://pulsar.apache.org/docs/en/develop-binary-protocol/#batch-messages
  • https://pulsar.apache.org/docs/en/client-libraries-java/#configuring-producers

blankensteiner avatar Oct 01 '19 12:10 blankensteiner

I was a bit surprised when I realized how batching is implemented. When sending a batch to the server, the command is relatively straight forward:

  • Frame size
  • Command size
  • Command (Send)
  • Magic number
  • Checksum
  • MessageMetadata size
  • MessageMetadata
  • Payload is a sequence of:
    • SingleMessageMetadata size
    • SingleMessageMetadata
    • Payload

I was expecting the server to unwrap the command and store each message by itself in BookKeeper, but they are actually stored as one and therefore also delivered to the reader/consumer as one message. This means that:

  • All readers and consumers MUST be able to read batched messages.
  • The consumer's cursor is only moved forward when the consumer has acknowledged all the messages in a batch.
  • Messages with delayed delivery (DeliverAt) can not be batched.
  • Since PartitionKey and OrderingKey can be set pr SingleMessageMetadata, you can have a mix of these pr batch. We need to examine how this is handled.

We have implemented support for reading and consuming batched messages (from version 0.6.0), but producing batched messages is currently on hold.

blankensteiner avatar Oct 14 '19 17:10 blankensteiner

Hey @blankensteiner happy new year to you! Any ETA on this?

eaba avatar Jan 10 '20 10:01 eaba

Hi @eaba Thanks and a happy new year to you too! :-D Currently, all my time is spent on implementing OpenShift and after that, we will be looking into implementing Pulsar here at Danske Commodities. This will mean more development time and a lot of developers using DotPulsar and being able to contribute (we have close to 30 developers here). So, the status right now is that it is not being worked on and we have no ETA. I doubt if this is a feature that will be requested within Danske Commodities and therefore I hope someone in the community will consider implementing it.

blankensteiner avatar Jan 10 '20 13:01 blankensteiner

Thanks for the response - just seeing this after posting a new issue!

eaba avatar Jan 10 '20 16:01 eaba

@RobertIndie We need to add these methods to IProducerBuilder

/// <summary>
/// Set the maximum number of messages permitted in a batch. The default is 1000.
/// </summary>
IProducerBuilder BatchingMaxMessagesPerBatch(int maxMessagesPerBatch);

/// <summary>
/// Set the time period within which the messages sent will be batched. The default is 1 ms.
/// </summary>
IProducerBuilder BatchingMaxPublishDelay(TimeSpan maxPublishDelay);

/// <summary>
/// Control whether automatic batching of messages is enabled for the producer. The default is 'false'.
/// </summary>
IProducerBuilder BatchingEnabled(bool batchingEnabled);

This will require us to add these properties to ProducerOptions

/// <summary>
/// Set the maximum number of messages permitted in a batch. The default is 1000.
/// </summary>
public int BatchingMaxMessagesPerBatch { get; set; }

/// <summary>
/// Set the time period within which the messages sent will be batched. The default is 1 ms.
/// </summary>
public TimeSpan BatchingMaxPublishDelay { get; set; }

/// <summary>
/// Control whether automatic batching of messages is enabled for the producer. The default is 'false'.
/// </summary>
public bool BatchingEnabled { get; set; }

blankensteiner avatar Aug 05 '20 10:08 blankensteiner

@RobertIndie We need to get 'max_message_size' from 'CommandConnected' to ensure we don't create batches that are too big. The field is optional, so I guess we need to have a fallback to 5.242.880 bytes (5 MB)?

blankensteiner avatar Aug 05 '20 10:08 blankensteiner

@RobertIndie We need to get 'max_message_size' from 'CommandConnected' to ensure we don't create batches that are too big. The field is optional, so I guess we need to have a fallback to 5.242.880 bytes (5 MB)?

Ok, I agree.

RobertIndie avatar Aug 05 '20 11:08 RobertIndie