logstash-input-file
logstash-input-file copied to clipboard
Allow recording the file byte offset into a field value
Originally from: https://github.com/elasticsearch/logstash/issues/1641
Similar report with specific multiline concerns in https://logstash.jira.com/browse/LOGSTASH-1044
+1
This is exactly what I am looking for.
+1
+1
+1
@jordansissel For us, it is a matter of wanting to be able to read out lines in the exact order they were ingested. Ordering on @timestamp won't be enough, since some log events are emitted with a pretty high rate and have the same timestamp. Thus, any form of incrementing number (globally, locally or other) in addition to the timestamp would be valuable.
Sequence counter plugin, not exactly byte offset but better than nothing: https://github.com/leeeena/logstash-filter-seq
+1
+1
I'm very new to Ruby but this patch appears to accomplish the goal of this issue.
diff -r logstash-5.0.0/vendor/bundle/jruby/1.9/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb logstash-5.0.0.eric/vendor/bundle/jruby/1.9/gems/filewatch-0.9.0/lib/filewatch/observing_tail.rb
10c10
< def accept(line) end
---
> def accept(line, offset) end
79c79
< listener.accept(line)
---
> listener.accept(line, @sincedb[watched_file.inode])
diff -r logstash-5.0.0/vendor/bundle/jruby/1.9/gems/logstash-input-file-4.0.0/lib/logstash/inputs/file.rb logstash-5.0.0.eric/vendor/bundle/jruby/1.9/gems/logstash-input-file-4.0.0/lib/logstash/inputs/file.rb
177a178
> @offset = 0
254c255
< attr_reader :input, :path, :data
---
> attr_reader :input, :path, :data, :offset
266c267
< def accept(data)
---
> def accept(data, offset)
269c270
< input.codec.accept(dup_adding_state(data))
---
> input.codec.accept(dup_adding_state(data, offset))
274a276
> event.set("offset", offset)
278c280
< def add_state(data)
---
> def add_state(data, offset)
279a282
> @offset = offset
286,287c289,290
< def dup_adding_state(line)
< self.class.new(path, input).add_state(line)
---
> def dup_adding_state(line, offset)
> self.class.new(path, input).add_state(line, offset)
This is a feature that is on the radar for any future development of this plugin.
There are things to consider though, in general and about patch above.
- We cannot keep adding another argument to all the methods in the call chain for each extra piece of information.
- This extra information is generalised as
context
orprovenance
, i.e. stuff that describes where the data came from. Allowing for the capture of context is a feature that will eventually become available on all Logstash Inputs (or data sources) as well as Beats. - The proposal to get the offset from
@sincedb[watched_file.inode]
is problematic because at the time of calling@sincedb[watched_file.inode]
it has the offset of the previous line.
It is planned that all input plugins will read and send a chunk of data to the codec. This eliminates the problem of mismatching a codec that expects chunks (or lines) with an input that is providing lines (or chunks). Logstash cannot have multiple codecs associated with an input at the moment but there are clear cases where this is needed.
The recording of progress positional data in the sincedb is done after the event is assumed to be created and put in the queue - in general terms this can be classed as acknowledgement. At the moment acknowledgement is done in an arbitrary way by each input. For example the JDBC input records the ID of the last-read-record so that, on restart, it will not reread the previous records. These "acks" are inferred from the return of the method call that adds the event to the queue.
If we move the "extract lines from chunks" from filewatch
to the input then we will need a callback in filewatch
to accept the position information to write to the sincedb.
+1
Has this been done? I do not see it in the latest 7.6 logstash release
Why is this useful feature not prioritized? It's open for 5 years. is it coming out anytime soon?