MarkSomethingDownLLS
MarkSomethingDownLLS copied to clipboard
logstash 知识梳理
- [x] 公司当前使用的 logstash 的 filter 配置
- [ ] Logstash 最佳实践 -- 需要好好看看
公司当前使用的 logstash 的 filter 配置
从公司使用的 10-json-filter.conf.j2 配置文件中看到,其使用了如下三种 filter plugin :
plugin list 详见 https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
- json - Parses JSON events (github)
- mutate - Performs mutations on fields (github)
- date - Parses dates from fields to use as the Logstash timestamp for an event (github)
json
描述
This is a JSON parsing filter. It takes an existing field which contains JSON and expands it into an actual data structure within the Logstash event.
该 filter 针对一个包含 JSON 数据格式的 field ,将其展开为一个 Logstash event 内的数据结构;
By default it will place the parsed JSON in the root (top level) of the Logstash event, but this filter can be configured to place the JSON into any arbitrary event field, using the
target
configuration.
默认情况下,该 filter 会将解析后的 JSON 放到 root 级别(顶级)的 Logstash event 下;但允许通过 target
配置将其放到任意 event field 下;
This plugin has a few fallback scenario when something bad happen during the parsing of the event. If the JSON parsing fails on the data, the event will be untouched and it will be tagged with a
_jsonparsefailure
then you can use conditionals to clean the data. You can configured this tag with thentag_on_failure
option.
解析 JSON 失败后的 fallback 处理
If the parsed data contains a
@timestamp
field, we will try to use it for the event’s@timestamp
, if the parsing fails, the field will be renamed to_@timestamp
and the event will be tagged with a_timestampparsefailure
.
如果解析后的数据中包含 @timestamp
,则直接将其用作 event 的 @timestamp
;如果解析后的数据中没有 @timestamp
,则使用 _@timestamp
;
Json Filter 配置选项
包含如下四个配置选项
- skip_on_invalid_json, boolean, 非必须
- source, string, 必须
- tag_on_failure, array, 非必须
- target, string, 非必须
source
- This is a required setting.
- Value type is string
- There is no default value for this setting.
The configuration for the JSON filter:
source => source_field
For example, if you have JSON data in the message
field:
filter {
json {
source => "message"
}
}
The above would parse the json from the message
field
将 message field 中的内容解析为 json 数据结构
target
- Value type is string
- There is no default value for this setting.
Define the target
field for placing the parsed data. If this setting is omitted, the JSON data will be stored at the root (top level) of the event.
For example, if you want the data to be put in the
doc
field:
如下示例,将数据放在 doc
field 下面:
filter {
json {
target => "doc"
}
}
JSON in the value of the
source
field will be expanded into a data structure in thetarget
field. -- 这里说明,如果不指定source => "message"
则默认为source => "source"
属于 source
field 的 JSON value 会被扩展到 target
field 中;
NOTE: if the target field already exists, it will be overwritten!
真实例子
通过
json {
source => "message"
target => "http"
}
将 message
的内容按照 JSON 解析后,挂在了 http
field 下面
{
"_index": "logstash-2019.02.25",
"_type": "doc",
"_id": "TRR-I2kBpsd9WC6Gp4sb",
"_version": 1,
"_score": null,
"_source": {
"message": "{\"clientip\":\"-\",\"upstream_addr\":\"-\",\"ident\":\"-\",\"auth\":\"-\",\"_timestamp\":\"1551078694.532\",\"host\":\"172.31.5.173\",\"verb\":\"GET\",\"request\":\"/\",\"httpversion\":\"HTTP/1.1\",\"response\":\"404\",\"bytes\":\"48\",\"referrer\":\"-\",\"agent\":\"ELB-HealthChecker/2.0\",\"req_time\":0.0,\"upstream_resp_time\":0.0,\"proxy_time\":0.0,\"upstream_login\":\"-\",\"req_path\":\"/\",\"upstream_ip\":\"-\",\"upstream_port\":\"-\",\"_hostname\":\"ip-172-31-5-173\",\"_source\":\"nginx.no_upstream.access\",\"_level\":\"info\"}",
"http": {
"auth": "-",
"upstream_addr": "-",
"referrer": "-",
"_timestamp": "1551078694.532",
"host": "172.31.5.173",
"upstream_port": "-",
"upstream_resp_time": 0,
"agent": "ELB-HealthChecker/2.0",
"clientip": "-",
"response": "404",
"ident": "-",
"_level": "info",
"upstream_login": "-",
"request": "/",
"_hostname": "ip-172-31-5-173",
"upstream_ip": "-",
"proxy_time": 0,
"_source": "nginx.no_upstream.access",
"verb": "GET",
"bytes": "48",
"req_time": 0,
"req_path": "/",
"httpversion": "HTTP/1.1"
},
"@timestamp": "2019-02-25T07:11:34.532Z",
"@version": "1"
},
"fields": {
"@timestamp": [
"2019-02-25T07:11:34.532Z"
]
},
"sort": [
1551078694532
]
}
mutate
描述
The
mutate
filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events.
mutate
filter 用于针对 event 中的 fields 进行通用变换:
- rename
- remove
- replace
- modify
Mutate Filter 支持的全部配置选项
详见 https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-options
mutate 处理顺序
Mutations in a config file are executed in this order:
- coerce
- rename
- update
- replace
- convert
- gsub
- uppercase
- capitalize
- lowercase
- strip
- remove
- split
- join
- merge
- copy
You can control the order by using separate
mutate
blocks.
控制调用顺序的好的实践为使用不同的 mutate
blocks ;
例如:先 split 后 rename
filter {
mutate {
split => ["hostname", "."]
add_field => { "shortHostname" => "%{hostname[0]}" }
}
mutate {
rename => ["shortHostname", "hostname" ]
}
}
- 将 hostname 的内容按照 "." 拆分,变成 array ,名字仍是 hostname
- 添加一个新的 field 命名为 shortHostname ,其值为 hostname[0] 的值
- 将 shortHostname 重命名会 hostname
add_field
- Value type is hash
- Default value is
{}
If this filter is successful, add any arbitrary fields to this event. Field names can be dynamic and include parts of the event using the
%{field}
.
可以通过 add_field
添加任意 field 到 event 中;field 的名字可以是动态的,动态名字通过 %{field}
方式来引用 event 中的其他部分内容;
示例:
filter {
mutate {
add_field => { "foo_%{somefield}" => "Hello world, from %{host}" }
}
}
# You can also add multiple fields at once:
filter {
mutate {
add_field => {
"foo_%{somefield}" => "Hello world, from %{host}"
"new_field" => "new_static_value"
}
}
}
If the event has field "somefield" == "hello" this filter, on success, would add field
foo_hello
if it is present, with the value above and the %{host} piece replaced with that value from the event. The second example would also add a hardcoded field.
上面第一个例子:如果当前 event 中包含了 somefield
field ,并且其值为 "hello" ,那么在执行成功后,将会添加一个名为 foo_hello
的 field 到该 event 中,且其值为 "Hello world, from %{host}" ,%{host} 部分会被动态替换;
上面第二个例子:一次性添加多个 field 的用法,且直接使用了 hardcoded field ;
remove_field
- Value type is array
- Default value is
[]
If this filter is successful, remove arbitrary fields from this event. Example:
filter {
mutate {
remove_field => [ "foo_%{somefield}" ]
}
}
# You can also remove multiple fields at once:
filter {
mutate {
remove_field => [ "foo_%{somefield}", "my_extraneous_field" ]
}
}
If the event has field "somefield" == "hello" this filter, on success, would remove the field with name
foo_hello
if it is present. The second example would remove an additional, non-dynamic field.
真实例子
通过
# extract common field
mutate {
add_field => {
"timestamp" => "%{[http][_timestamp]}"
}
}
# update timestamp
date {
#try to update @timestamp which elasticsearch used to sort data with timestamp
#1468403039, a unix timestamp
match => ["timestamp", "UNIX"]
}
# remove tmp timestamp
mutate {
remove_field => [ "timestamp"]
}
先添加 timestamp
field 到 root of the event 下,其值为 "%{[http][_timestamp]}" ,即 http._timestamp
的值,再对其进行相应的处理(这里使用了 date
filter),最后再将该 timestamp
移除;因此,在最终的数据中,你不会看到名字为 timestamp
的 field 出现在 root of the event 下;
date
描述
The
date
filter is used for parsing dates from fields, and then using that date or timestamp as the logstash timestamp for the event.
date
filter 用于从所有 fields 中解析 date 相关数据,并将相应的 date 或 timestamp 作为给 event 使用的 logstash timestamp ;
For example, syslog
events usually have timestamps like this:
"Apr 17 09:32:01"
You would use the date format MMM dd HH:mm:ss
to parse this.
The
date
filter is especially important for sorting events and for backfilling old data. If you don’t get the date correct in your event, then searching for them later will likely sort out of order.
date
filter 对 event 排序和旧数据 backfilling 非常重要;
In the absence of this filter, logstash will choose a timestamp based on the first time it sees the event (at input time), if the timestamp is not already set in the event. For example, with file input, the timestamp is set to the time of each read.
如果没有使用该 filter ,则 logstash 会根据首次见到该 event 的时间作为 timestamp ;
Date Filter 配置选项
-
locale
, string, 非必须 -
match
, array, 非必须 -
tag_on_failure
, array, 非必须 -
target
, string, 非必须 -
timezone
, string, 非必须
match
- Value type is array
- Default value is
[]
An array with field name first, and format patterns following, [ field, formats... ]
If your time field has multiple possible formats, you can do this:
针对多种时间格式的解析配置
match => [ "logdate",
"MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss",
"ISO8601" ]
The above will match a syslog
(rfc3164) or iso8601
timestamp.
There are a few special exceptions. The following format literals exist to help you save time and ensure correctness of date parsing.
-
ISO8601
- should parse any valid ISO8601 timestamp, such as2011-04-19T03:44:01.103Z
-
UNIX
- will parse float or int value expressing unix time in seconds since epoch like1326149001.132
as well as1326149001
-
UNIX_MS
- will parse int value expressing unix time in milliseconds since epoch like1366125117000
-
TAI64N
- will parse tai64n time values
For example, if you have a field logdate
, with a value that looks like Aug 13 2010 00:03:44
, you would use this configuration:
filter {
date {
match => [ "logdate", "MMM dd yyyy HH:mm:ss" ]
}
}
If your field is nested in your structure, you can use the nested syntax
[foo][bar]
to match its value. For more information, please refer to Field References
针对有嵌套情况的结构,可以使用嵌套语法来解析;
More details on the syntax
略
target
- Value type is string
- Default value is "
@timestamp
"
Store the matching timestamp into the given
target
field. If not provided, default to updating the@timestamp
field of the event.
用于保存 match 到的 timestamp 到 target
field ;如果没有指定 target
field ,则默认使用 event 的 @timestamp
;
真实例子
通过
# extract common field
mutate {
add_field => {
"timestamp" => "%{[http][_timestamp]}"
}
}
# update timestamp
date {
#try to update @timestamp which elasticsearch used to sort data with timestamp
#1468403039, a unix timestamp
match => ["timestamp", "UNIX"]
}
# remove tmp timestamp
mutate {
remove_field => [ "timestamp"]
}
将 match 到的 timestamp
field 的值按照 UNIX 进行解析,并保存到 @timestamp
field 中