cdk icon indicating copy to clipboard operation
cdk copied to clipboard

feat(GuEc2App): Remove `enabledDetailedInstanceMonitoring` optional property

Open akash1810 opened this issue 8 months ago • 3 comments

What does this change?

Removes the enabledDetailedInstanceMonitoring property from the GuEc2App pattern. Monitoring is now always enabled, providing EC2 metrics at a higher granularity of one minute (default is five minutes). This should allow for earlier horizontal scaling and provide more detail during incident triage.

[!NOTE]

  • This change will cost roughly $3 per instance per month. We might want to consider disabling it for non-production environments.
  • These are not the same as metrics from the CloudWatch Agent, which services might also be using.

This behaviour can be changed via an escape hatch:

import type { CfnLaunchTemplate } from "aws-cdk-lib/aws-ec2";

declare const myApp: GuEc2App;

const launchTemplate = myApp.autoScalingGroup.instanceLaunchTemplate.node.defaultChild as CfnLaunchTemplate;

// Disable enhanced monitoring
launchTemplate.addPropertyOverride("LaunchTemplateData.Monitoring.Enabled", false);

// OR remove the property entirely
launchTemplate.addPropertyDeletionOverride("LaunchTemplateData.Monitoring.Enabled");

See also:

  • https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/manage-detailed-monitoring.html
  • https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html

How to test

See updated tests.

How can we measure success?

Our services can horizontally scale earlier.

Have we considered potential risks?

This has a direct cost implication. At the time of writing, we have ~630 running instances across all accounts. At ~$3 per instance per month, this'll be a total of ~$23,000[^1]. If we only enable this on production instances, this reduces to ~$15k.

Checklist

  • [ ] I have listed any breaking changes, along with a migration path [^2]
  • [ ] I have updated the documentation as required for the described changes [^3]

[^1]: 630 * 3 * 12 [^2]: Consider whether this is something that will mean changes to projects that have already been migrated, or to the CDK CLI tool. If changes are required, consider adding a checklist here and/or linking to related PRs. [^3]: If you are adding a new construct or pattern, has new documentation been added? If you are amending defaults or changing behaviour, are the existing docs still valid?

akash1810 avatar Apr 30 '25 08:04 akash1810

🦋 Changeset detected

Latest commit: d1ee03ac0da3551638a3037b23b391f9cf877291

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@guardian/cdk Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

changeset-bot[bot] avatar Apr 30 '25 08:04 changeset-bot[bot]

I am supportive of the additional cost for PROD instance, and so would like to be added as default on PROD if that possible.

mchv avatar May 02 '25 10:05 mchv

@akash1810 - I took this out of review. Best of luck with the deployment.

markjamesbutler avatar May 08 '25 14:05 markjamesbutler

I am supportive of the additional cost for PROD instance, and so would like to be added as default on PROD if that possible.

GuCDK is intentionally stage agnostic (ADR here), therefore we cannot implement the "default on PROD" behaviour unfortunately. In an attempt to provide an accessible API and DX, I've created a new mandatory property. Usage could look something like this:

const stage = props.stage;
const instanceMetricGranularity = stage === "PROD" ? "1Minute" ? "5Minute";

const myApp = new GuEc2App(this, "my-app", {
  instanceMetricGranularity,
});

akash1810 avatar May 13 '25 15:05 akash1810

NOTE: An alternative API would to make this property mandatory if the ASG has a scaling policy attached to it. To do this, we'd need to perform a check similar to this - https://github.com/guardian/cdk/blob/de6d9a9f59ba333271fa4174a1c1afc9be748e0f/src/experimental/patterns/ec2-app.ts#L138-L152.

akash1810 avatar May 23 '25 11:05 akash1810