vector
vector copied to clipboard
ECS log schema support
Hi Vector team, general question how can we add Elastic Common Schema for vector data before writing to elasticsearch.
@raghu999 great question! Vector's schema assumptions are currently very simple. Common fields names can be controlled via the global log_schema options. Outside of that, your best bet is to use the rename_fields transform to match that schema for your data.
But I really like the idea of Vector defining a more explicit schema around all fields. Specifically, the fields added in transforms like ec2_metadata and geoip. All of that should be customizable in a global sense.
Our current pipeline also tries to comply to ECS before writing data to elasticsearch.
Considering the following log message, our pipeline looks like this:
2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component || My log message
A first regex_parser stage will extract individual parts (raw) from the log message. After parsing, the LogEvent will look like this:
| Field | Value |
|---|---|
| log_timestamp | 2020-13-10T10:01:23Z |
| log_thread_id | 12345 |
| log_level | INFO |
| log_logger | My.Namespace.Component |
| log_message | My log message |
We then use a combination of rename_fields and lua transforms (to parse the thread id and timestamp) to rename the fields according to ECS.
Our final LogEvent will look like this
| Field | Value |
|---|---|
| @timestamp | 2020-13-10T10:01:23Z |
| process.thread.id | 12345 |
| log.level | INFO |
| log.logger | My.Namespace.Component |
| message | My log message |
| host.name | node01 |
| log.original | 2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component |
Hope that helps
Thanks, @oktal, that's helpful. We are actively outlining first-class support for schemas like ECS. We hope to get the initial versions out this quarter (#3910). It'll likely start with more control over field mapping at the source and sink level and then progress into formal support for the schemas.
https://github.com/ypid/event-processing-framework (modular config for Vector) has extensive support for ECS. Especially things like syslog should have good coverage.