fluent-plugin-cloudwatch-logs
fluent-plugin-cloudwatch-logs copied to clipboard
When invoked from the fluent-plugin-cloudwatch-logs, the AWS SDK logs every AWS API call it makes, no matter what global logging level is set for fluentd
Problem
When invoked from the fluent-plugin-cloudwatch-logs plugin, the AWS SDK logs every AWS API call it makes, no matter what global logging level is set for fluentd. The end result is that fluentd log grows proportionately to the "payload" log traffic that is forwarded to CloudWatch.
I believe the root cause is here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/aws-sdk-core/plugins/logging.rb#L53, because this line
config.logger.send(config.log_level, format(config, response))
always logs at the current log level.
However, an issue in this repo provides the necessary application-level context for what is arguably a defect in the AWS SDK.
Steps to replicate
Example fluentd config
fluentd.conf:
<source>
@type forward
@id forward_input
</source>
<match test.**>
@type stdout
@id stdout_output
</match>
<match cw.**>
@type cloudwatch_logs
auto_create_stream true
region <...>
log_group_name cw_group_name
log_stream_name cw_stream_name
concurrency 2
aws_key_id <...>
aws_sec_key <...>
include_time_key true
log_rejected_request true
</match>
Fluentd logs
In response to
$ echo '{"foo":"bar"}' | fluent-cat cw
the following is seen in fluentd logs. Note that the log messages from Aws::CloudWatchLogs::Client are issued at every log level:
$ fluentd -c ./fluentd.conf --no-supervisor // info log level
2021-08-04 10:16:55 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.2607 0 retries] describe_log_groups()
2021-08-04 10:16:55 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.068632 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)
2021-08-04 10:16:55 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.054434 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)
2021-08-04 10:16:56 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.072863 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 10:16:56 -0400 [info]: [Aws::CloudWatchLogs::Client 200 0.090928 0 retries] put_log_events(log_events:[{timestamp:1628086554740,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:15:54-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
===============
$ fluentd -c ./fluentd.conf --no-supervisor -v // debug log level
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.243379 0 retries] describe_log_groups()
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.055964 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.048201 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.055562 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 10:22:02 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.083676 0 retries] put_log_events(log_events:[{timestamp:1628086859696,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:20:59-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
==============
$ fluentd -c ./fluentd.conf --no-supervisor -vv // trace log level
021-08-04 10:30:38 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.285799 0 retries] describe_log_groups()
2021-08-04 10:30:38 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.06596 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)
2021-08-04 10:30:38 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.056249 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)
2021-08-04 10:30:39 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.08038 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 10:30:39 -0400 [trace]: fluent/log.rb:281:trace: [Aws::CloudWatchLogs::Client 200 0.097014 0 retries] put_log_events(log_events:[{timestamp:1628087373685,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:29:33-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
==============
$ fluentd -c ./fluentd.conf --no-supervisor -q // warn log level
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.244758 0 retries] describe_log_groups()
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.058111 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.047988 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.060144 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 10:38:07 -0400 [warn]: [Aws::CloudWatchLogs::Client 200 0.073899 0 retries] put_log_events(log_events:[{timestamp:1628087823673,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:37:03-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
==============
$ fluentd -c ./fluentd.conf --no-supervisor -qq // error log level
2021-08-04 10:43:16 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.241862 0 retries] describe_log_groups()
2021-08-04 10:43:16 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.064994 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)
2021-08-04 10:43:16 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.047917 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)
2021-08-04 10:43:17 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.05636 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 10:43:17 -0400 [error]: [Aws::CloudWatchLogs::Client 200 0.081287 0 retries] put_log_events(log_events:[{timestamp:1628088133366,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T10:42:13-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
Expected Behavior
I would expect that AWS API calls are logged at a certain log level (debug) and above, but not below. For example, with the debug log level hardcoded in the AWS SDK code line mentioned above:
$ fluentd -c ./fluentd.conf --no-supervisor -vv // trace
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.320894 0 retries] describe_log_groups()
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.063318 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.047943 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.060117 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 11:56:57 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.088652 0 retries] put_log_events(log_events:[{timestamp:1628092555600,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T11:55:55-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
========
$ fluentd -c ./fluentd.conf --no-supervisor -v // debug
# hardcoded :debug
2021-08-04 11:41:22 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.237807 0 retries] describe_log_groups()
2021-08-04 11:41:22 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.05453 0 retries] create_log_group(log_group_name:"cw_group_name",tags:nil)
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.04345 0 retries] describe_log_streams(log_group_name:"cw_group_name",log_stream_name_prefix:"cw_stream_name",next_token:nil)
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.059298 0 retries] create_log_stream(log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: [Aws::CloudWatchLogs::Client 200 0.06734 0 retries] put_log_events(log_events:[{timestamp:1628091621300,message:"{\"foo\":\"bar\",\"time\":\"2021-08-04T11:40:21-04:00\"}"}],log_group_name:"cw_group_name",log_stream_name:"cw_stream_name")
2021-08-04 11:41:23 -0400 [debug]: fluent/log.rb:302:debug: Called PutLogEvents API group="cw_group_name" stream="cw_stream_name" events_count=1 events_bytesize=74 sequence_token=nil thread=11609020 request_sec=0.067749457
========
$ fluentd -c ./fluentd.conf --no-supervisor // info
no output
==========
$ fluentd -c ./fluentd.conf --no-supervisor -q // warn
no output
==========
Fluentd and CloudWatchLogs plugin versions
- OS version: Fedora 34
- Bare Metal or within Docker or Kubernetes or others? : Bare metal
- Fluentd v0.12 or v0.14/v1.0
- paste result of
fluentd --version
ortd-agent --version
- paste result of
$ fluentd --version
fluentd 1.7.4
$
- Dependent gem versions
- paste boot log of fluentd or td-agent
$ fluentd -c ./fluentd.conf
2021-08-11 16:46:39 -0400 [info]: parsing config file is succeeded path="./fluentd.conf"
2021-08-11 16:46:40 -0400 [info]: using configuration file: <ROOT>
<source>
@type forward
@id forward_input
</source>
<match test.**>
@type stdout
@id stdout_output
</match>
<match cw.**>
@type cloudwatch_logs
auto_create_stream true
region "us-east-2"
log_group_name "cw_group_name"
log_stream_name "cw_stream_name"
concurrency 2
aws_key_id xxxxxx
aws_sec_key xxxxxx
include_time_key true
log_rejected_request true
<inject>
time_key time
time_type string
</inject>
</match>
</ROOT>
2021-08-11 16:46:40 -0400 [info]: starting fluentd-1.7.4 pid=18992 ruby="2.5.5"
2021-08-11 16:46:40 -0400 [info]: spawn command to main: cmdline=["/home/syedriko/.rbenv/versions/2.5.5/bin/ruby", "-Eascii-8bit:ascii-8bit", "/home/syedriko/.rbenv/versions/2.5.5/bin/fluentd", "-c", "./fluentd.conf", "--under-supervisor"]
2021-08-11 16:46:40 -0400 [info]: gem 'fluent-plugin-cloudwatch-logs' version '0.7.6'
2021-08-11 16:46:40 -0400 [info]: gem 'fluentd' version '1.7.4'
2021-08-11 16:46:40 -0400 [info]: adding match pattern="test.**" type="stdout"
2021-08-11 16:46:40 -0400 [info]: adding match pattern="cw.**" type="cloudwatch_logs"
2021-08-11 16:46:40 -0400 [info]: adding source type="forward"
2021-08-11 16:46:40 -0400 [info]: #0 starting fluentd worker pid=19024 ppid=18992 worker=0
2021-08-11 16:46:40 -0400 [info]: #0 [forward_input] listening port port=24224 bind="0.0.0.0"
2021-08-11 16:46:40 -0400 [info]: #0 fluentd worker is now running worker=0
- paste result of
fluent-gem list
,td-agent-gem list
or your Gemfile.lock
$ fluent-gem list
*** LOCAL GEMS ***
aws-eventstream (1.1.0)
aws-partitions (1.396.0)
aws-sdk-cloudwatchlogs (1.38.0)
aws-sdk-core (3.109.3)
aws-sigv4 (1.2.2)
bigdecimal (default: 1.3.4)
cmath (default: 1.0.0)
concurrent-ruby (1.1.9)
cool.io (1.7.1)
csv (default: 1.0.0)
date (default: 1.0.0)
did_you_mean (1.2.0)
dig_rb (1.0.1)
etc (default: 1.0.0)
fcntl (default: 1.0.0)
fiddle (default: 1.0.0)
fileutils (default: 1.0.2)
fluent-plugin-cloudwatch-logs (0.7.6)
fluentd (1.7.4)
http_parser.rb (0.6.0)
io-console (default: 0.4.6)
ipaddr (default: 1.2.0)
jmespath (1.4.0)
json (default: 2.1.0)
minitest (5.10.3)
msgpack (1.4.2)
net-telnet (0.1.1)
openssl (default: 2.1.2)
power_assert (1.1.1)
psych (default: 3.0.2)
rake (12.3.0)
rdoc (default: 6.0.1)
scanf (default: 1.0.0)
sdbm (default: 1.0.0)
serverengine (2.2.4)
sigdump (0.2.4)
stringio (default: 0.0.1)
strptime (0.2.5)
strscan (default: 1.0.0)
test-unit (3.2.7)
tzinfo (2.0.4)
tzinfo-data (1.2021.1)
webrick (default: 1.4.2)
xmlrpc (0.3.0)
yajl-ruby (1.4.1)
zlib (default: 1.0.0)
$
On a second thought, I'm not sure this is an AWS SDK issue necessarily. Consider this test snippet, which is supposed to fail a create_log_group() call at different log levels:
require 'aws-sdk-cloudwatchlogs'
require 'time'
require 'logger'
logger = Logger.new(STDOUT)
logger.level = 'UNKNOWN'
cwl_client = Aws::CloudWatchLogs::Client.new(
region: 'us-east-2',
access_key_id: <...>,
secret_access_key: <...>,
logger: logger
)
resp = cwl_client.delete_log_group(
log_group_name: 'foo'
)
resp = cwl_client.create_log_group(
log_group_name: 'foo'
)
if !resp.successful?
puts 'create_log_group failed' + resp
exit(1)
end
['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL', 'UNKNOWN'].each do |level|
begin
puts "logger.level: " + level
logger.level = level
resp = cwl_client.create_log_group(
log_group_name: 'foo'
)
rescue
end
end
The output this produces is
$ ruby ./test.rb
logger.level: DEBUG
D, [2021-08-11T23:21:09.354722 #28770] DEBUG -- : [Aws::CloudWatchLogs::Client 400 0.054844 0 retries] create_log_group(log_group_name:"foo") Aws::CloudWatchLogs::Errors::ResourceAlreadyExistsException The specified log group already exists
logger.level: INFO
logger.level: WARN
logger.level: ERROR
logger.level: FATAL
logger.level: UNKNOWN
$
that is, AWS API calls are only logged at DEBUG level.