Refactor helper binaries to save 161MB of disk space when the agent is installed and reduce RPM by 48MB
Revisions:
- Address comments including a helper to discover the main executable
- Add more clear error message when agent binary is not found
Description of the issue
The cloudwatch-agent has 3 helper binaries that use an excessive amount of disk
- amazon-cloudwatch-agent-config-wizard (14M)
- config-downloader (37MB)
- config-translator (117M)
The reason is they are directly and indirectly pulling dependencies from cloudwatch-agent. To solve this problem, I updated config-downloader, config-translator and the wizard to be shims that just redirect to amazon-cloudwatch-agent binary
This works fine, the main risk is these binaries are no longer "portable", they depend on finding the path to amazon-cloudwatch-agent at runtime. I am using the same method as start-amazon-cloudwatch-agent for finding the path
Description of changes
High level the approach is to maintain the same argument interface for the existing 3 commands and seamlessly move the logic in to the main binary. To do this the old commands need to keep the same args but we prefix the args when we call CWA so that there are no duplicate args.
The general approach I took was to create a new cmdwrapper which offers two methods.
- AddFlag - this method takes in a map of flags and a prefix. The prefix is blank for the binary to be replaced and the prefix is the command name when called by amazon-cloudwatch-agent. it hooks directly in to the
flagAPI for pulling command line args. I considered using subcommands but things got too complicated and ugly, prefixing was simpler - ExecuteAgentCommand - this takes in a set of flags and then it finds
amazon-cloudwatch-agentand it calls it with the new flags. It remaps stdin/stdout/stderr so it appears seemless. Things like the wizard which rely on stdin still work
I moved the flags in to their own separate flags packages which have NO dependencies (keeping the binaries small). And then the old binaries and amazon-cloudwatch-agent pull in the flags and the commands and link everything together. For the wizard I had to move a few other common constants like the in linuxMigration.go and windows_migration.go
I added and updated unit tests wherever possible. The old translator_test.go was moved in to translatorutil_test.go as that is primarily what those tests were testing
Note: inline diff is VERY hard to follow because of all the moved code, I recommend split diff or we can do a code walk-through
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Integ test run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/15216630285
Manual testing to confirm cloudwatch agent still starts/loads
There were no prior tests that actually tested config-translate binary... the old tests basically just tested cmdutil so I moved those in to the cmdutil tests. We could write more tests for the shim and for translate.go. Tricky tests to write due to the OS coupling.
Before:
total 297M
-rwxr-xr-x 1 patchad amazon 129M Dec 2 22:49 amazon-cloudwatch-agent
-rwxr-xr-x 1 patchad amazon 14M Dec 2 22:49 amazon-cloudwatch-agent-config-wizard
-rwxr-xr-x 1 patchad amazon 37M Dec 2 22:44 config-downloader
-rwxr-xr-x 1 patchad amazon 117M Dec 2 22:47 config-translator
-rwxr-xr-x 1 patchad amazon 2.1M Dec 2 22:49 start-amazon-cloudwatch-agent
➜ amazon-cloudwatch-agent git:(main) ✗ ls -lh ~/Downloads/amazon-cloudwatch-agent.rpm
Permissions Size User Date Modified Name
.rw-r--r--@ 113M patchad 2 Dec 15:36 /Users/patchad/Downloads/amazon-cloudwatch-agent.rpm
After:
total 136M
-rwxr-xr-x 1 patchad amazon 129M Dec 16 20:34 amazon-cloudwatch-agent
-rwxr-xr-x 1 patchad amazon 1.7M Dec 16 20:34 amazon-cloudwatch-agent-config-wizard
-rwxr-xr-x 1 patchad amazon 1.7M Dec 16 20:34 config-downloader
-rwxr-xr-x 1 patchad amazon 1.7M Dec 16 20:34 config-translator
-rwxr-xr-x 1 patchad amazon 2.1M Dec 16 20:34 start-amazon-cloudwatch-agent
-rw-r--r-- 1 patchad amazon 65M Dec 16 20:35 /local/home/patchad/workplace/cwa/amazon-cloudwatch-agent/build/bin/linux/amd64/amazon-cloudwatch-agent.rpm
Wizard still works:
2024/12/17 21:54:39 Starting config-wizard, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:54:39 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-wizard -config-wizard-is-non-interactive-windows-migration false -config-wizard-use-parameter-store false -config-wizard-is-non-interactive-linux-migration false -config-wizard-traces-only false -config-wizard-non-interactive-xray-migration false]
================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
= =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply. =
================================================================
On which OS are you planning to use the agent?
1. linux
4. windows
5. darwin
default choice: [1]:
1
Trying to fetch the default region based on ec2 metadata...
I! imds retry client will retry 1 timesD! should retry false for imds error : RequestCanceled: EC2 IMDS access disabled via AWS_EC2_METADATA_DISABLED env varW! could not get region from ec2 metadata... EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestCanceled: EC2 IMDS access disabled via AWS_EC2_METADATA_DISABLED env varAre you using EC2 or On-Premises hosts?
1. EC2
3. On-Premises
default choice: [2]:
Downloader/Translator still work
➜ amazon-cloudwatch-agent git:(patchad-config-translate-refactor) ✗ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
****** processing amazon-cloudwatch-agent ******
2024/12/17 21:56:21 Starting config-downloader, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:56:21 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-downloader -config-downloader-config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml -config-downloader-multi-config default -config-downloader-mode ec2 -config-downloader-download-source file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -config-downloader-output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d]
I! Trying to detect region from ec2 D! [EC2] Found active network interface I! imds retry client will retry 1 times
Start configuration validation...
2024/12/17 21:56:21 Starting config-translator, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:56:21 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-translator -config-translator-multi-config default -config-translator-input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -config-translator-input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d -config-translator-output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml -config-translator-mode ec2 -config-translator-config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml]
2024-12-17T21:56:21Z I! Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp ...
2024-12-17T21:56:21Z I! Valid Json input schema.
2024-12-17T21:56:21Z I! Configuration validation first phase succeeded
I! Trying to detect region from ec2
D! [EC2] Found active network interface
I! imds retry client will retry 1 times
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
Rough Edges
This is the flow if config-translator fails. I tried to unify the code so behavior is slightly different. We could change this if we needed to new:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
****** processing amazon-cloudwatch-agent ******
2024/12/17 21:57:13 Starting config-downloader, this will map back to a call to amazon-cloudwatch-agent
2024/12/17 21:57:13 Executing /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent with arguments: [-config-downloader -config-downloader-mode ec2 -config-downloader-download-source file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -config-downloader-output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d -config-downloader-config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml -config-downloader-multi-config default]
2024-12-17T21:57:13Z E! Failed to initialize config downloader: fail to fetch/remove json config: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json: no such file or directory
2024/12/17 21:57:13 E! Translation process exited with non-zero status: 1, err: exit status 1
panic: E! Translation process exited with non-zero status: 1, err: exit status 1
goroutine 1 [running]:
log.Panicf({0x4e8cc3?, 0xc0000c8020?}, {0xc0000c3de0?, 0xc000098038?, 0x3?})
log/log.go:439 +0x65
github.com/aws/amazon-cloudwatch-agent/tool/cmdwrapper.ExecuteAgentCommand({0x4e165b, 0x11}, 0xc0000a4120)
github.com/aws/amazon-cloudwatch-agent/tool/cmdwrapper/cmdwrapper.go:59 +0x5df
main.main()
github.com/aws/amazon-cloudwatch-agent/cmd/config-downloader/downloader.go:20 +0xc8
➜ amazon-cloudwatch-agent git:(patchad-config-translate-refactor) ✗ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
****** processing amazon-cloudwatch-agent ******
2024/12/17 21:58:07 E! Fail to fetch/remove json config: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json: no such file or directory
Requirements
Before commit the code, please do the following steps.
- Run
make fmtandmake fmt-sh - Run
make lint
This PR was marked stale due to lack of activity.
This PR was marked stale due to lack of activity.
This PR was marked stale due to lack of activity.
This PR was marked stale due to lack of activity.
Why are we deleting: cmd/amazon-cloudwatch-agent-config-wizard/wizard_test.go
it's just being moved to tool/wizard/wizard_test.go
This PR was marked stale due to lack of activity.