logstash-input-file icon indicating copy to clipboard operation
logstash-input-file copied to clipboard

File Input should have option to spawn multiple reader threads

Open ghost opened this issue 10 years ago • 10 comments

Scenario: I have a lot of different files that logstash monitors. (One logfile per webserver) If, for whatever reason, logstash crashes or I need to reboot - it can take a very long time (hours, or more) for logstash to be able to catch up again, as it reads one file at a time per input.

The current solution seems to be to create multiple file inputs, which does work. However this creates a management overhead (many file input entries). If I use any kind of 'elastic' capability to spin up/down servers on demand managing (and restarting) logstash configuration becomes particularly difficult.

Suggested solution: Provide a 'readers' option that lets me specify the number of readers I'd like logstash to spin up.

(originally logged as /logstash/issues/2930)

ghost avatar Apr 01 '15 05:04 ghost

+1 the worker model for the file input is currently 1 thread only. We should improve this (with some kind of configurable work pool)

jordansissel avatar Jul 23 '15 21:07 jordansissel

+1 !! :) this is particularly painful in a bulk import. Is there any workaround suggested? maybe by monitoring the folder for new files and spawning one logstash-forwarder per file and have logstash changed from file to tcp? It's ugly, I know.

splashx avatar Aug 09 '15 12:08 splashx

Up :+1:

splashx avatar Sep 10 '15 16:09 splashx

By the way, what's the suggested workaround here? logstash-forwarder doesn't also multithread as expected. Even if it does, after forwarding it to log stash the input-lumberjack plugin doesn't support multithreading. Are users using file input doomed to one thread?

splashx avatar Sep 12 '15 18:09 splashx

The current work-around is to have multiple file inputs.

eg:

input { 
   file { }
   file { } 
}

with each file input having a different path pattern.

The problem is trying to balance this: Too many file inputs, and your memory usage goes up, and threads sit idle Too few file inputs, some threads catch up before others and then sit idle while others are busy.

The other problem being that I need to manage these patterns - so if I have a file pattern per server, then as I add/remove servers, I need to update the file pattern.

ghost avatar Sep 13 '15 01:09 ghost

logstash-forwarder doesn't also multithread as expected.

lsf uses goroutines (green threads, basically). If you need to use more cpu cores, I believe you can set the GOMAXPROCS=4 (where 4 is your desired number of cpu cores to use, for example) environment variable to achieve that. I don't know what performance changes would occur of it, though.

Even if it does, after forwarding it to log stash the input-lumberjack plugin doesn't support multithreadin

the lumberjack input is multithreaded. Can you clarify what what you mean?

jordansissel avatar Sep 13 '15 05:09 jordansissel

By the way, what's the suggested workaround here?

Workaround is to split up your file inputs (https://github.com/logstash-plugins/logstash-input-file/issues/22#issuecomment-139838320)

jordansissel avatar Sep 13 '15 05:09 jordansissel

The current work-around is to have multiple file inputs.

I've tried that - not very powerful. Then I tried having 1 logstash process (instance) with one file {} block per log file (so I had 10 logstash instances in one box reading 10 different log files), this turned out to have way better performance.

the lumberjack input is multithreaded. Can you clarify what what you mean?

Is it? Because from the official page I couldn't find the multithread options like in the rabbitmq input plugin. If I can have lumberjack input multithreaded, than I can tune logstash-forwarder to ship everything to logstash.

splashx avatar Sep 13 '15 19:09 splashx

@splashx - Can you take discussions on how you'd like to performance tune other aspects of the system elsewhere. This is an issue to request adding multithreading to the file input.

ghost avatar Sep 13 '15 20:09 ghost

@willhughes as I've seen in other issues/requests, these discussions about "performance tune" are valid because if one can achieve it in some other way, the feature request may be irrelevant. Besides, until the feature is not implemented others can take this thread as a reference for workarounds. The workaround you proposed doesn't really perform as one would expect - if someone is looking for multithreaded file input, obviously the person has a performance concern (and here comes the tuning). So for others ending up here I recommend running multiple instances of logstash (ugly) to achieve high throughput while reading several files or really wait for the feature to come around / code it yourself ;)

splashx avatar Sep 13 '15 21:09 splashx