amazon-cloudwatch-agent icon indicating copy to clipboard operation
amazon-cloudwatch-agent copied to clipboard

I wish fetch-config would not delete the .json config file

Open jedwards1211 opened this issue 1 year ago • 14 comments

It's counterproductive that running fetch-config deletes the input .json config file.
When I'm debugging issues I want to just edit the file and rerun the fetch-config command. The fact that fetch-config deletes the file makes this more of a hassle.

And to me it seems like the configuration is overcomplicated (converting to a different .toml format, there's also a .yaml file there for some reason). It would be way more straightforward if we just specify .json file or SSM parameter or whatever as the configuration source, and the CloudWatch agent just leaves that as the source of truth, i.e. always reads from that file or SSM parameter on startup instead of fetching it and storing it in some other format.

jedwards1211 avatar Jun 18 '24 16:06 jedwards1211

Hi

Using the fetch-config command should not deleting the config file. Could you provide some logs and outputs demonstrating this issue?

Thank you!

okankoAMZ avatar Jul 18 '24 17:07 okankoAMZ

In the /opt/aws/amazon-cloudwatch-agent/etc directory:

[ec2-user@ip-172-31-44-197 etc]$ sudo cp amazon-cloudwatch-agent.json.bak amazon-cloudwatch-agent.json
[ec2-user@ip-172-31-44-197 etc]$ ls
amazon-cloudwatch-agent.d     amazon-cloudwatch-agent.json.bak  amazon-cloudwatch-agent.yaml  env-config.json
amazon-cloudwatch-agent.json  amazon-cloudwatch-agent.toml      common-config.toml            log-config.json
[ec2-user@ip-172-31-44-197 etc]$ sudo amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -s
****** processing amazon-cloudwatch-agent ******
2024/07/18 18:31:17 I! imds retry client will retry 1 times
I! Trying to detect region from ec2 D! [EC2] Found active network interface Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp
Start configuration validation...
2024/07/18 18:31:17 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp ...
2024/07/18 18:31:17 I! Valid Json input schema.
2024/07/18 18:31:17 I! imds retry client will retry 1 times
2024/07/18 18:31:17 D! ec2tagger processor required because append_dimensions is set
2024/07/18 18:31:17 D! pipeline hostDeltaMetrics has no receivers
2024/07/18 18:31:17 Configuration validation first phase succeeded
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
[ec2-user@ip-172-31-44-197 etc]$ ls
amazon-cloudwatch-agent.d         amazon-cloudwatch-agent.toml  common-config.toml  log-config.json
amazon-cloudwatch-agent.json.bak  amazon-cloudwatch-agent.yaml  env-config.json

You can see that amazon-cloudwatch-agent.json is gone in output of the final ls.

amazon-cloudwatch-agent.log:

2024-07-18T18:31:17Z I! Profiler is stopped during shutdown
2024-07-18T18:31:17.681Z        info    otelcol/collector.go:227        Received signal from OS {"signal": "terminated"}
2024-07-18T18:31:17.682Z        info    service/service.go:157  Starting shutdown...
2024-07-18T18:31:17.692Z        info    extensions/extensions.go:44     Stopping extensions...
2024-07-18T18:31:17.693Z        info    service/service.go:171  Shutdown complete.
2024/07/18 18:31:19 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 
2024/07/18 18:31:19 D! config [agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  interval = "300s"
  logfile = "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log"
  logtarget = "lumberjack"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = false
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.disk]]
    fieldpass = ["used_percent"]
    mount_points = ["/mnt/data-01"]
    tagexclude = ["mode"]

  [[inputs.logfile]]
    destination = "cloudwatchlogs"
    file_state_folder = "/opt/aws/amazon-cloudwatch-agent/logs/state"

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cloud-init-output.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cloud-init-output.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cfn-init.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cfn-init.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cfn-init-cmd.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cfn-init-cmd.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cfn-hup.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cfn-hup.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/manage-db-reconf.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/manage-db-reconf.log"
      pipe = false
      retention_in_days = 7

  [[inputs.mem]]
    fieldpass = ["used_percent"]

[outputs]

  [[outputs.cloudwatch]]

  [[outputs.cloudwatchlogs]]
    force_flush_interval = "5s"
    log_stream_name = "i-07597f6c4d5733042"
    region = "us-west-2"
2024/07/18 18:31:19 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml 
2024/07/18 18:31:19 D! config connectors: {}
exporters:
    awscloudwatch:
        force_flush_interval: 1s
        max_datums_per_call: 1000
        max_values_per_datum: 150
        namespace: CWAgent
        region: us-west-2
        resource_to_telemetry_conversion:
            enabled: true
        rollup_dimensions:
            - - InstanceId
              - path
extensions: {}
processors:
    ec2tagger:
        ec2_instance_tag_keys: []
        ec2_metadata_tags:
            - InstanceId
        imds_retries: 1
        refresh_interval_seconds: 0s
receivers:
    telegraf_disk:
        collection_interval: 5m0s
        initial_delay: 1s
    telegraf_mem:
        collection_interval: 5m0s
        initial_delay: 1s
service:
    extensions: []
    pipelines:
        metrics/host:
            exporters:
                - awscloudwatch
            processors:
                - ec2tagger
            receivers:
                - telegraf_disk
                - telegraf_mem
    telemetry:
        logs:
            development: false
            disable_caller: false
            disable_stacktrace: false
            encoding: console
            error_output_paths: []
            initial_fields: {}
            level: info
            output_paths:
                - /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
            sampling:
                initial: 2
                thereafter: 500
        metrics:
            address: ""
            level: None
            metric_readers: []
        resource: {}
        traces:
            propagators: []
2024/07/18 18:31:19 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2024/07/18 18:31:19 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2024/07/18 18:31:19 I! Valid Json input schema.
2024/07/18 18:31:19 I! Detected runAsUser: root
2024/07/18 18:31:19 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 0:0
2024-07-18T18:31:19Z I! Starting AmazonCloudWatchAgent CWAgent/1.300028.1 (go1.20.8; linux; amd64)
2024-07-18T18:31:19Z I! AWS SDK log level not set
2024-07-18T18:31:19Z I! creating new logs agent
2024-07-18T18:31:19Z I! [logagent] starting
2024-07-18T18:31:19Z I! [logagent] found plugin cloudwatchlogs is a log backend
2024-07-18T18:31:19Z I! [logagent] found plugin logfile is a log collection
2024-07-18T18:31:19Z I! [logagent] start logs plugin file paths [/var/log/cloud-init-output.log /var/log/cfn-init.log /var/log/cfn-init-cmd.log /var/log/cfn-hup.log /var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log /var/log/manage-db-reconf.log]
2024-07-18T18:31:19Z I! [inputs.logfile] turned on logs plugin
2024-07-18T18:31:19.552Z        info    service/telemetry.go:96 Skipping telemetry setup.       {"address": "", "level": "None"}
2024-07-18T18:31:19Z I! imds retry client will retry 1 times
2024-07-18T18:31:19.559Z        info    service/service.go:131  Starting CWAgent...     {"Version": "1.300028.1", "NumCPU": 2}
2024-07-18T18:31:19.559Z        info    extensions/extensions.go:30     Starting extensions...
2024-07-18T18:31:19Z I! cloudwatch: get unique roll up list [[InstanceId path]]
2024-07-18T18:31:19.572Z        info    ec2tagger/ec2tagger.go:435      ec2tagger: Check EC2 Metadata.  {"kind": "processor", "name": "ec2tagger", "pipeline": "metrics/host"}
2024-07-18T18:31:19Z I! cloudwatch: publish with ForceFlushInterval: 1s, Publish Jitter: 35.296087ms
2024-07-18T18:31:19.575Z        info    ec2tagger/ec2tagger.go:411      ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes      {"kind": "processor", "name": "ec2tagger", "pipeline": "metrics/host"}
2024-07-18T18:31:19.575Z        info    service/service.go:148  Everything is ready. Begin running and processing data.
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 51573 in /var/log/cloud-init-output.log
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 365 in /var/log/cfn-init.log
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 30110 in /var/log/cfn-hup.log
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 25243 in /var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log
2024-07-18T18:31:20Z I! First time setting retention for log group clarity-2-db-syslog-r02, update map to avoid setting twice
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cloud-init-output.log(/var/log/cloud-init-output.log) to cloudwatchlogs with retention 7
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cfn-init.log(/var/log/cfn-init.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cfn-init-cmd.log(/var/log/cfn-init-cmd.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cfn-hup.log(/var/log/cfn-hup.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log(/var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/manage-db-reconf.log(/var/log/manage-db-reconf.log) to cloudwatchlogs with retention -1

jedwards1211 avatar Jul 18 '24 18:07 jedwards1211

Hi! Thank you for providing the logs. The fetch-config shouldn't delete the json file by design. I will try to re-create this issue and get back to you as soon as possible.

okankoAMZ avatar Jul 19 '24 14:07 okankoAMZ

Any updates to this issue? We are experiencing the same thing. Is it expected instead that the config.json file gets transposed into the .toml file and the .json file is removed as it is no longer needed?

platymatt avatar Aug 23 '24 14:08 platymatt

@okankoAMZ I just noticed this in the journal after re-fetching the config... the main PID is logging:

/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.

I want to emphasize again, the number of different files and formats CWAgent seems to shuffle the config through doesn't inspire confidence. It seems like asking for bugs.

● amazon-cloudwatch-agent.service - Amazon CloudWatch Agent
     Loaded: loaded (/etc/systemd/system/amazon-cloudwatch-agent.service; enabled; preset: disabled)
     Active: active (running) since Tue 2024-09-17 00:48:53 UTC; 5s ago
   Main PID: 435744 (amazon-cloudwat)
      Tasks: 8 (limit: 2257)
     Memory: 105.1M
        CPU: 888ms
     CGroup: /system.slice/amazon-cloudwatch-agent.service
             └─435744 /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml -envconfig /opt/aws/amazon-cloudwatch-agent/etc/env-config.json -otelconfig /opt/aws/amazon-cloud>

Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 I! Valid Json input schema.
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: I! Detecting run_as_user...
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: I! Trying to detect region from ec2
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 D! ec2tagger processor required because append_dimensions is set
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 D! pipeline hostDeltaMetrics has no receivers
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 Configuration validation first phase succeeded
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435744]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435744]: I! Detecting run_as_user...

jedwards1211 avatar Sep 17 '24 00:09 jedwards1211

Hi any update on the file getting deleted

solomongit3 avatar Sep 27 '24 17:09 solomongit3

This occurs on EC2 (Linux 2023) as well by default.

Riskcomplexx avatar Oct 12 '24 15:10 Riskcomplexx

still happens on 24.04 build, wasted 1 hour

umutsesen avatar Dec 16 '24 12:12 umutsesen

This is happening to us too, as a first time user of cloudwatch agent, this was incredibly confusing

Here's our status so you can see the version number. Running on Ubuntu v22 LTS on EC2.

{
  "status": "running",
  "starttime": "2024-12-19T07:20:14+00:00",
  "configstatus": "configured",
  "version": "1.300049.1b929"
}

massimocode avatar Dec 19 '24 07:12 massimocode

I ran the following:

ubuntu@REDACTED:/tmp$ sudo chattr +i /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
ubuntu@REDACTED:/tmp$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
****** processing amazon-cloudwatch-agent ******
I! Trying to detect region from ec2 D! [EC2] Found active network interface I! imds retry client will retry 1 timesSuccessfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp
Start configuration validation...
2024/12/19 07:34:40 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp ...
2024/12/19 07:34:40 I! Valid Json input schema.
2024/12/19 07:34:40 D! ec2tagger processor required because append_dimensions is set
2024/12/19 07:34:40 Configuration validation first phase succeeded
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
I! imds retry client will retry 1 times
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
rm: cannot remove '/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json': Operation not permitted

** Check out the last line, that fetch-config command is definitely trying to delete the config!**

massimocode avatar Dec 19 '24 07:12 massimocode

Aha: https://github.com/aws/amazon-cloudwatch-agent/blob/43d475d73656230eecafab991dc16a05d0736de8/packaging/dependencies/amazon-cloudwatch-agent-ctl#L306

jedwards1211 avatar Dec 20 '24 03:12 jedwards1211

The problem seems to be putting my config in exactly this file before running the import command:

readonly JSON="${CONFDIR}/amazon-cloudwatch-agent.json"

amazon-cloudwatch-agent-ctl deletes this file, so maybe it uses it for some other purpose in some cases?

In any case, using some other file path should work around this.

Once again, this process of importing the config in one file format and outputting in another file format seems like a mess. Would be much cleaner if instead, the main cloudwatch agent process just supported loading its config on startup directly from either a toml, yaml, or json file and we didn't need to run this import step at all.

jedwards1211 avatar Dec 20 '24 04:12 jedwards1211

I can confirm this happens in ver1.300053.0b1046

DaveQB avatar May 14 '25 08:05 DaveQB

same issue with version 1.300055.0b1095

adile-cyclope avatar May 14 '25 16:05 adile-cyclope