logstash-input-file
logstash-input-file copied to clipboard
Options to disable 'host' and/or 'path' field adding.
previous discuss https://discuss.elastic.co/t/not-to-add-host-path-field/49889/2
Hi.
With this configuration,
input {
file {
codec ... json
path => ...
}
}
output {
elasticsearch {
...
}
}
logstash will add fields 'host' and 'path' to the origin document.
"_source": {
...
"path": "/home1/.logstash/input/default_agg_daily/0505.json",
"host": "xyzzzz"
},
And I want to remove path and host by
mutate {remove_fileds => [...] }
But some case the origin document may contain field 'path' and/or 'host'. So I can not use remove_filds simply.
How about provide an option to disable host and path field adding?
We could do this. It should be possible to store both path and host in @metadata
always, and allow hiding them if they're not populated.
Any news about this feature?
No news about this feature. If news occurs, it will be documented in this issue.
It should be possible to store both path and host in @metadata always, and allow hiding them if they're not populated.
As I can see on tag 4.0.3
both field host
and path
are already written into '@metadata'.
Also there is already an if
before writing host
/path
into event.
This feature could be implemented by adding a config flag on those if
s. Or am I missing something?
Any update on @sevdog 's question?
The reality is that most users have migrated away from using the file input plugin in Logstash in favor of filebeat, and filebeat includes even more metadata, and puts things into ECS (Elastic Common Schema).
The best way to eliminate host and path in either case—filebeat, or Logstash's file input plugin—would be to simply remove the undesired field(s) using mutate.
Forcing users to use mutate
to remove those fields may cause problems when using JSON codec with a schema where those fields may be present.
IE:
{"var1": "foo", "var2": "bar", "host": "example.com", "path": "/some/path"}
{"var1": "foo2", "var2": "bar2"}
There records inside logstash will be:
{"var1": "foo", "var2": "bar", "host": "example.com", "path": "/some/path"}
{"var1": "foo2", "var2": "bar2", "host": "logstash", "path": "/path/to/log.json"}
To avoid metadata from polluting our records we should use two if
and two
mutate, because both fields are not mandatory in our schema.
This may cause more throubles if we have more file inputs which are processed in our pipeline, since we should add more clauses in if
statements.
Why not using filebeat then, since it store its metadata inside a namespace (beats
) which is hard to collide? Simply because if are collection files from the host on which logstash is installed, then using filebeat would be an overkill (and a waste or resources).
IMHO there is no reason to not have an option, disabled by default, to prevent metadata from polluting real records.
@untergeek I will check out filebeat.
I agree with @sevdog assessment that there should be an option to control this behavior and disabled by default.