logstash-input-file icon indicating copy to clipboard operation
logstash-input-file copied to clipboard

Options to disable 'host' and/or 'path' field adding.

Open jeesim2 opened this issue 8 years ago • 8 comments

previous discuss https://discuss.elastic.co/t/not-to-add-host-path-field/49889/2

Hi.

With this configuration,

input {
    file {
       codec ... json
        path   => ...
    }
}
output {
    elasticsearch {
        ...
    }
}

logstash will add fields 'host' and 'path' to the origin document.

  "_source": {
   ...
    "path": "/home1/.logstash/input/default_agg_daily/0505.json",
    "host": "xyzzzz"
  },

And I want to remove path and host by

mutate {remove_fileds => [...] }

But some case the origin document may contain field 'path' and/or 'host'. So I can not use remove_filds simply.

How about provide an option to disable host and path field adding?

jeesim2 avatar May 13 '16 02:05 jeesim2

We could do this. It should be possible to store both path and host in @metadata always, and allow hiding them if they're not populated.

untergeek avatar May 17 '16 16:05 untergeek

Any news about this feature?

sevdog avatar Dec 14 '17 16:12 sevdog

No news about this feature. If news occurs, it will be documented in this issue.

jordansissel avatar Dec 14 '17 16:12 jordansissel

It should be possible to store both path and host in @metadata always, and allow hiding them if they're not populated.

As I can see on tag 4.0.3 both field host and path are already written into '@metadata'.

Also there is already an if before writing host/path into event.

This feature could be implemented by adding a config flag on those ifs. Or am I missing something?

sevdog avatar Feb 26 '18 11:02 sevdog

Any update on @sevdog 's question?

indusbull avatar Oct 21 '19 19:10 indusbull

The reality is that most users have migrated away from using the file input plugin in Logstash in favor of filebeat, and filebeat includes even more metadata, and puts things into ECS (Elastic Common Schema).

The best way to eliminate host and path in either case—filebeat, or Logstash's file input plugin—would be to simply remove the undesired field(s) using mutate.

untergeek avatar Oct 21 '19 21:10 untergeek

Forcing users to use mutate to remove those fields may cause problems when using JSON codec with a schema where those fields may be present.

IE:

{"var1": "foo", "var2": "bar", "host": "example.com", "path": "/some/path"}
{"var1": "foo2", "var2": "bar2"}

There records inside logstash will be:

{"var1": "foo", "var2": "bar", "host": "example.com", "path": "/some/path"}
{"var1": "foo2", "var2": "bar2", "host": "logstash", "path": "/path/to/log.json"}

To avoid metadata from polluting our records we should use two if and two mutate, because both fields are not mandatory in our schema.

This may cause more throubles if we have more file inputs which are processed in our pipeline, since we should add more clauses in if statements.

Why not using filebeat then, since it store its metadata inside a namespace (beats) which is hard to collide? Simply because if are collection files from the host on which logstash is installed, then using filebeat would be an overkill (and a waste or resources).

IMHO there is no reason to not have an option, disabled by default, to prevent metadata from polluting real records.

sevdog avatar Oct 22 '19 07:10 sevdog

@untergeek I will check out filebeat.

I agree with @sevdog assessment that there should be an option to control this behavior and disabled by default.

indusbull avatar Oct 28 '19 13:10 indusbull