rsyslog-doc icon indicating copy to clipboard operation
rsyslog-doc copied to clipboard

Check docs to make sure that these disk assisted queues details are present

Open deoren opened this issue 6 years ago • 0 comments

I saw a thread on the mailing list which appeared relevant to some issues that I encountered in the past. This explanation of how disk assisted queues work was very interesting and seemed worth recording (at least a variation of it) in the docs:

@davidelang:

when there is a backlog of messages in a disk queue, and there are new messages being processed from the memory queue, my understanding is that the system has to be idle enough, then it will pull a batch of messages from disk and put them in the memory queue and send them.

@rgerhards:

it's actually much easier:

I a DA queue is two queues in one: 1) regular memory queue, 2) regular disk queue.

Both are totally independent, except that the memory queue pushes data to the disk queue when it runs out of space.

When there was an outage, this disk queue is created and data is pushed to it. When the system comes up again, but the in-memory and the disk queue process message. They do that in parallel. No synchronization between the two. Except of course the regular syn when the same action/ruleset/whatever is getting data concurrently.

Disk queue always has exactly one worker thread, because we use sequential files. The worker works at "full speed", whatever that means. So data in the in-memory queue does not per se block the disk queue. HOWEVER, if the mem queue sends data very quickly, the target may push back. In that case, the mem queue gets a bigger share than the disk queue, simply because it can emit more messages in the same time frame. All high-level queue operations are type-agnostic: it doesn't make a difference if they use a mem or disk queue (well, DIRECT mode is different, but that's not of importance here).

HOWEVER, I did a quick code review for this thread, and it looks like the DeqBatchSize parameter is not forwarded to the disk queue, which would actually mean batch sizes of 8. I am not sure if it is intentional or not (large batch sizes may not play well with the queue i/o structure).

Looked at the disk queue system from a high level, I would like to totally re-write it for several years. It's the oldest part of rsyslog and very io intensive. E.g. it needs to read all data twice because it was originally not spec'ed to survive hard rsyslog abort. When this requirement came up, I needed to introduce two reader functions as everything else was out of scope for the existing code. Rewriting is a major effort (3..6 month?) for which I have not time and there also is no sponsor.

Nevertheless, the disk queue should keep the system busy while it is running (but look at i/o and CPU). Almost no system usage and slow queue sounds indeed wrong. It is definitely interesting to see what causes this.

deoren avatar Nov 13 '18 15:11 deoren