dque icon indicating copy to clipboard operation
dque copied to clipboard

Batch peek and dequeue operations

Open neilisaac opened this issue 5 years ago • 6 comments

For the sake of discussion, I'm considering adding batch uploads from a queue consumer, but would require a batch read API.

Suggested API:

// BatchPeek returns a slice of up to 1-n objects without dequeueing them.
// Fewer than n items may be returned, depending on the remaining objects in the first segment.
// ErrEmpty is returned if the queue is empty.
BatchPeek(n int) ([]interface{}, error)

// BatchDequeue dequeues and returns a slice of up to 1-n objects.
// Fewer than n items may be returned, depending on the remaining objects in the first segment.
// ErrEmpty is returned if the queue is empty.
BatchDequeue(n int) ([]interface{}, error)

neilisaac avatar Dec 31 '19 16:12 neilisaac

That makes sense. My first thought was, "why not also BatchEnqueue"? But maybe there isn't as good of a use-case for that.

Batch uploads must be part of your use-case after you grab a few things from the queue.

joncrlsn avatar Jan 02 '20 02:01 joncrlsn

Hi Neil, Peek has always been a bit questionable in my mind for use in a multi-threaded environment where another thread could come and grab the thing you just peeked at. It feels to me like we would need a way to lock the queue while the code calling Peek decides whether or not to use the object.

I probably should not have included Peek until I had a good use-case for it, although the use-case in my mind was for when you only have one thread doing the dequeues.

joncrlsn avatar Jan 06 '20 21:01 joncrlsn

I'm using Peek in an application with one consumer thread per queue (and potentially many writers via http handlers) for at-least-once delivery, so it's a solid use case. I'm considering using batching in the future to upload multiple records at once.

For multi-threaded use cases, it's probably most convenient to provide either a transactional dequeue api or Next/Ack/Nack methods (similar to RabbitMQ for example https://godoc.org/github.com/streadway/amqp#Delivery). Both of these approaches would be difficult to implement without relaxing safety guarantees or changing the file format (append 0 on dequeue.)

For the sake of simplicity, users of dque can either use a single consumer thread with Peek and Dequeue, multiple threads with Dequeue only, or BatchPeek and BatchDequeue with multi-threaded scatter-gather in between.

neilisaac avatar Jan 07 '20 05:01 neilisaac

Are you up for tackling this issue, Neil?

joncrlsn avatar Feb 05 '20 16:02 joncrlsn

@joncrlsn I'll take a stab at it

neilisaac avatar Feb 10 '20 16:02 neilisaac

I am also interested in having disk queue with BatchPeek and BatchDequeue. Would be nice to have a guarantee that if BatchPeek(n) returns k elements (0 < k <= n), then subsequent BatchDequeue(k) will return exactly k elements.

My use case is a local buffer before sending data to Kafka (using sarama go library), but to aid with performance I need to do batching, if possible during period of high traffic or backpressure. Peeking one element, sending to Kafka, waiting for ack, then dequeueing, is way too slow in my use case. Using batch peek and kafka send will also help with kafka compression. BatchPeek(100) should help a lot in my app.

baryluk avatar Jul 19 '22 12:07 baryluk