elastic-ci-stack-for-aws icon indicating copy to clipboard operation
elastic-ci-stack-for-aws copied to clipboard

CreateLogGroup service limits

Open sj26 opened this issue 4 years ago • 3 comments

The elastic stack exports logs to cloudwatch logs. The official aws cloudwatch logs exporter seems to call CreateLogGroup for each exported log on each host as it boots, and for some customers this is leading to hitting service limits and creating elastic stacks failing

It looks like we're using the awslogs cloudwatch logs agent: https://github.com/buildkite/elastic-ci-stack-for-aws/blob/v4-development/packer/scripts/install-awslogs.sh https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/UsePreviousCloudWatchLogsAgent.html ​ ​and configuring it to pump files up to log groups in a pretty conventional way: ​https://github.com/buildkite/elastic-ci-stack-for-aws/blob/v4-development/packer/conf/awslogs/awslogs.conf

We're exporting groups including:

  • /buildkite/buildkite-agent
  • /buildkite/cfn-init
  • /buildkite/cloud-init
  • /buildkite/cloud-init/output
  • /buildkite/docker-daemon
  • /buildkite/elastic-stack
  • /buildkite/elastic-stack-init
  • /buildkite/lifecycled
  • /buildkite/system

​The reference docs suggest that this will only create the log group if it doesn't exist: ​https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html

The implementation which has helpfully been uploaded here seems to be always create, and swallow errors: https://github.com/jinty/awscli-cwlogs-debian/blob/c8e4a1d5a0d9ec771581967e4de63407b8d0e9ac/cwlogs/push.py#L1314-L1324

But the call is made, so the limits are utilized.

sj26 avatar Sep 17 '20 09:09 sj26

Relevant?

https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-cloudwatch-agent-now-open-source-and-included-with-amazon-linux-2/

sj26 avatar Sep 22 '20 00:09 sj26

fwiw, I just ran into this which may be the same issue?

Screen Shot 2021-03-05 at 4 31 36 PM

albertywu avatar Mar 06 '21 00:03 albertywu

@albertywu We've just merged #811 which we believe might help, however we haven't confirmed that directly.

Buildkite runs the latest master branch of the elastic stack so we'll dogfood the change, however we don't run enough agents to hit the CreateLogGroup quota. I wonder if you have a way to test the new agent prior to us releasing a new version of the stack (probably 5.3.0) so we can confirm the issue is resolved?

yob avatar Mar 14 '21 22:03 yob