telegraf
telegraf copied to clipboard
tail plugin issue
Relevant telegraf.conf
[global_tags]
job = "yibo-in-out"
[[inputs.tail]]
files = ["D:\\LPN5.07\\SystemLog\\*.log"]
# Read file from beginning.
from_beginning = false
data_format = "grok"
grok_patterns = ['^.*\[INFO\].*上报请求信息%{GREEDYDATA:message}$']
[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics
collect_cpu_time = false
## If true, compute and report the sum of all non-idle CPU states
report_active = false
## If true and the info is available then add core_id and physical_id tags
core_tags = false
[agent]
## Default data collection interval for all inputs
interval = "10s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
debug = true
logtarget = "file"
logfile = "C:\\Program Files\\telegraf\\logs.log"
logfile_rotation_max_size = "10MB"
logfile_rotation_max_archives = 7
log_with_timezone = "Asia/Shanghai"
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 10
[[processors.rename]]
## Specify one sub-table per rename operation.
[[processors.rename.replace]]
measurement = "tail"
dest = "parking-in-out"
[[processors.parser]]
parse_fields = ["message"]
drop_original = true
json_string_fields = ["VehicleNo","ChannelName"]
json_time_key = "EventTime"
json_time_format = "2006-01-02 15:04:05"
data_format = "json"
json_timezone = "Asia/Shanghai"
[[outputs.influxdb_v2]]
urls = ["http://10.200.101.17:8086"]
## Token for authentication.
token = "xxxxxxx"
organization = "icep"
bucket = "yiboinfo"
Logs from Telegraf
2023-07-07T18:03:01+08:00 D! [parsers.grok::tail] Grok no match found for: "[2023:07:07 18:02:58][DEBUG]出口通道号16,发送系统时间2023年07月07日"
2023-07-07T18:03:01+08:00 D! [parsers.grok::tail] Grok no match found for: "[2023:07:07 18:02:58][ERROR]通道号[2],IP[172.16.60.11]登陆成功后不能通讯"
2023-07-07T18:03:06+08:00 D! [outputs.influxdb_v2] Wrote batch of 4 metrics in 1.0002ms
2023-07-07T18:03:06+08:00 D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-07-07T18:03:16+08:00 D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-07-07T18:03:26+08:00 D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-07-07T18:03:36+08:00 D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-07-07T18:03:46+08:00 D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2023-07-07T18:03:56+08:00 D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
System info
telegraf 1.27, Windows server 2012 R2 Datacenter
Docker
No response
Steps to reproduce
- set telegraf.conf file, and start. telegraf grok record and send out to influxdb.
- in the beginning, working file, telegraf can get record, but hours, telegraf can not get any record. although new message coming into logs file.
Expected behavior
telegraf can analysis new message in logs file continually,
Actual behavior
telegraf is running ( from telegraf log, seemed telegraf working fine, no any issue report), but telegraf can not get new message and grok.
new message still coming from log file in D:\\LPN5.07\\SystemLog
2023:07:07 18:02:58][DEBUG]出口通道号16,发送系统时间2023年07月07日
[2023:07:07 18:02:58][INFO]出场场上报请求信息{"ParkCode":"510105037","VehicleNo":"GJ9L91","EventTime":"2023-07-07 18:02:20","StartTime":"2023-07-07 12:59:39","PaymentMoney":14.0,"TotalMoney":14.0,"DeductionMoney":0.0,"IsAutoPay":false,"CouponName":"","CouponMoney":0.0,"OperateCount":1738,"Qn":"2307071802200000008","BookingCode":"","ConsumeType":1,"PayType":2,"ChannelNo":8,"ChannelName":"3号楼出口2","CarImageID":"2307071802200000008","ISETCPay":false,"ETCPaymentMoney":0.0}
[2023:07:07 18:02:58][ERROR]通道号[2],IP[172.16.60.11]登陆成功后不能通讯
[2023:07:07 18:02:58][ERROR]通道号[4],IP[172.16.60.13]登陆成功后不能通讯
[2023:07:07 18:02:58][ERROR]通道号[1],IP[172.16.60.10]一体机重连不能通讯
[2023:07:07 18:02:58][ERROR]通道号[13],IP[10.18.212.12]一体机重连不能通讯
[2023:07:07 18:02:58][ERROR]通道号[11],IP[10.18.212.10]一体机重连不能通讯
[2023:07:07 18:02:58][ERROR]通道号[3],IP[172.16.60.12]一体机重连不能通讯
[2023:07:07 18:03:00][ERROR]通道号[2],IP[172.16.60.11]登陆成功后不能通讯
Additional info
No response
@tbs575,
It is not immediately obvious what might be going on here. Some questions:
- How large is this log file?
- How long was telegraf working for before it stopped working?
- Are you still getting debug messages about no match found for other lines that are coming in?
- Was the file rotated?
- If you restart telegraf, are the new lines parsed and sent correctly?
Thanks!
@powersj
- application write message into log file continually, application logs rotates every day. logs file size 40+M ~ 30+M.
- checked telegraf log, found 3 times for stopping work, 2 times for 4 hours( that mean telegraf woking fine for 4 hours), and one time for 13 hours
- Still get debug message, like
2023-07-11T09:31:57+08:00 D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
. when telegraf stop working, can not get anyno match
message - Yes, application logs was rotated every day. every 00:00, application will create new logs. from telegraf debug message, can saw telegraf found new logs files
2023-07-11T00:00:09+08:00 D! [inputs.tail] Tail added for "D:\\LPN5.07\\SystemLog\\2023-07-11.info.log"
- Yes, restart telegraf, will saw 'no match` message again, grok working back again.
Follow-up questions:
- When telegraf stops working do you continue to see metrics from the CPU plugin? Or do all metrics stop gathering?
- Do you see any "Wrote batch of 4 metrics in 1.0002ms" like messages?
- Can you please provide the full logs? I'd like to see the full flow of what happens.
- all metrics stop
- after stop working, can not saw 'Wrote batch ...", just only 'Buffer fullness: 0 / 10000 metrics'
- please check logs.log
more information,
-
I setup another telegraf on
windows server 2012 R2
and similar application logs, that working fine, no any issue -
application logs status can not be updated ( maybe windows OS issue, not sure), although new message into application log file, but just refresh windows explorer, application log file size is same, no any increase. I just open log file by text editor, without changing, exist directly, found file size increase. this case make telegraf can not grok. but I test filebeat, filbeat did not be impact, filebeat can grok
@powersj looking at the logs you can see that a lot of files are open (523 are closed at the end) because the "old" log-files are never closed and are kept open... Is it possible that we run into some filehandle limit or similar?
@tbs575 do you see anything in the Windows even-log around the time when Telegraf stops?
@srebhan
telegraf stop grok new message at 2023-07-11T14:34:28+08:00
in telegraf log, and at 2023-07-12T00:00:09+08:00
found new application's log file.
check windows event-log at 2023-07-11T14:34:28+08:00, there is nothing that can be noticed.
I don't know if it's released or not, but at least it worth mentioning that I see something similar w/ the KNX plugin. Issue #14695