dque
dque copied to clipboard
Batch peek and dequeue operations
For the sake of discussion, I'm considering adding batch uploads from a queue consumer, but would require a batch read API.
Suggested API:
// BatchPeek returns a slice of up to 1-n objects without dequeueing them.
// Fewer than n items may be returned, depending on the remaining objects in the first segment.
// ErrEmpty is returned if the queue is empty.
BatchPeek(n int) ([]interface{}, error)
// BatchDequeue dequeues and returns a slice of up to 1-n objects.
// Fewer than n items may be returned, depending on the remaining objects in the first segment.
// ErrEmpty is returned if the queue is empty.
BatchDequeue(n int) ([]interface{}, error)
That makes sense. My first thought was, "why not also BatchEnqueue"? But maybe there isn't as good of a use-case for that.
Batch uploads must be part of your use-case after you grab a few things from the queue.
Hi Neil, Peek has always been a bit questionable in my mind for use in a multi-threaded environment where another thread could come and grab the thing you just peeked at. It feels to me like we would need a way to lock the queue while the code calling Peek decides whether or not to use the object.
I probably should not have included Peek until I had a good use-case for it, although the use-case in my mind was for when you only have one thread doing the dequeues.
I'm using Peek in an application with one consumer thread per queue (and potentially many writers via http handlers) for at-least-once delivery, so it's a solid use case. I'm considering using batching in the future to upload multiple records at once.
For multi-threaded use cases, it's probably most convenient to provide either a transactional dequeue api or Next/Ack/Nack methods (similar to RabbitMQ for example https://godoc.org/github.com/streadway/amqp#Delivery). Both of these approaches would be difficult to implement without relaxing safety guarantees or changing the file format (append 0 on dequeue.)
For the sake of simplicity, users of dque can either use a single consumer thread with Peek and Dequeue, multiple threads with Dequeue only, or BatchPeek and BatchDequeue with multi-threaded scatter-gather in between.
Are you up for tackling this issue, Neil?
@joncrlsn I'll take a stab at it
I am also interested in having disk queue with BatchPeek
and BatchDequeue
. Would be nice to have a guarantee that if BatchPeek(n)
returns k
elements (0 < k <= n), then subsequent BatchDequeue(k)
will return exactly k
elements.
My use case is a local buffer before sending data to Kafka (using sarama go library), but to aid with performance I need to do batching, if possible during period of high traffic or backpressure. Peeking one element, sending to Kafka, waiting for ack, then dequeueing, is way too slow in my use case. Using batch peek and kafka send will also help with kafka compression. BatchPeek(100)
should help a lot in my app.