vector icon indicating copy to clipboard operation
vector copied to clipboard

ECS log schema support

Open raghu999 opened this issue 5 years ago • 4 comments

Hi Vector team, general question how can we add Elastic Common Schema for vector data before writing to elasticsearch.

raghu999 avatar Apr 23 '20 17:04 raghu999

@raghu999 great question! Vector's schema assumptions are currently very simple. Common fields names can be controlled via the global log_schema options. Outside of that, your best bet is to use the rename_fields transform to match that schema for your data.

But I really like the idea of Vector defining a more explicit schema around all fields. Specifically, the fields added in transforms like ec2_metadata and geoip. All of that should be customizable in a global sense.

binarylogic avatar Apr 23 '20 18:04 binarylogic

Our current pipeline also tries to comply to ECS before writing data to elasticsearch.

Considering the following log message, our pipeline looks like this:

2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component || My log message

A first regex_parser stage will extract individual parts (raw) from the log message. After parsing, the LogEvent will look like this:

Field Value
log_timestamp 2020-13-10T10:01:23Z
log_thread_id 12345
log_level INFO
log_logger My.Namespace.Component
log_message My log message

We then use a combination of rename_fields and lua transforms (to parse the thread id and timestamp) to rename the fields according to ECS.

Our final LogEvent will look like this

Field Value
@timestamp 2020-13-10T10:01:23Z
process.thread.id 12345
log.level INFO
log.logger My.Namespace.Component
message My log message
host.name node01
log.original 2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component

Hope that helps

oktal avatar Oct 13 '20 11:10 oktal

Thanks, @oktal, that's helpful. We are actively outlining first-class support for schemas like ECS. We hope to get the initial versions out this quarter (#3910). It'll likely start with more control over field mapping at the source and sink level and then progress into formal support for the schemas.

binarylogic avatar Oct 13 '20 13:10 binarylogic

https://github.com/ypid/event-processing-framework (modular config for Vector) has extensive support for ECS. Especially things like syslog should have good coverage.

ypid avatar Jan 03 '23 22:01 ypid