aws-otel-collector icon indicating copy to clipboard operation
aws-otel-collector copied to clipboard

Add new ECS error reporting feature in AOC

Open skyduo opened this issue 3 years ago • 10 comments

Description: This is the new error reporting feature for ECS. Except sending all error logs to the original destinations, we need to also send error or higher level log messages to another single error log file (the file path configurable later) and keep it less than 1 KB, so that ECS agent could mount this path and read the error log file. When the error log file exceed the max size (1KB), we want to delete the oldest log message to release the space. Also we want to rotate the logs by repeatedly delete the expired logs (the expired time is also configurable later).

Testing: make build and run the executable file with a config file with some errors, and succeed with 2 config errors write into the error log file.

Previous Pull Request: https://github.com/aws-observability/aws-otel-collector/pull/503

skyduo avatar Jun 09 '21 21:06 skyduo

Codecov Report

Merging #538 (d6c0433) into main (e50d672) will decrease coverage by 7.65%. The diff coverage is 42.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
- Coverage   61.45%   53.79%   -7.66%     
==========================================
  Files           8        9       +1     
  Lines         179      303     +124     
==========================================
+ Hits          110      163      +53     
- Misses         54      111      +57     
- Partials       15       29      +14     
Impacted Files Coverage Δ
pkg/logger/ecs_error_log.go 42.74% <42.74%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e50d672...d6c0433. Read the comment docs.

codecov-commenter avatar Jun 13 '21 01:06 codecov-commenter

ping @mxiamxia

pingleig avatar Jun 13 '21 04:06 pingleig

Hi @skyduo , since we use main branch for prod release. These changes are on critical path of running ADOT collector. Do you think we can move this changes in a feature branch and we can merge it after enough testing later?

mxiamxia avatar Jul 30 '21 22:07 mxiamxia

hey @mxiamxia ! sure, we can do that

skyduo avatar Aug 05 '21 04:08 skyduo

@skyduo I think you have another PR in upstream so this one can be closed?

pingleig avatar Oct 25 '21 23:10 pingleig

hey @pingleig! that is a new health check feature in the upstream repo which is different from this error reporting feature, also @mxiamxia can we merge this pr?

skyduo avatar Nov 01 '21 20:11 skyduo

@skyduo I think you need to sync w/ @jhnlsn or @alolita on merging this. We (me and Min) no longer actively work on ADOT.

btw: you need to rebase and fix the conflict before merging

pingleig avatar Nov 02 '21 23:11 pingleig

This PR is stale because it has been open 30 days with no activity.

github-actions[bot] avatar Jan 02 '22 20:01 github-actions[bot]

This PR is stale because it has been open 30 days with no activity.

github-actions[bot] avatar Feb 06 '22 20:02 github-actions[bot]

This PR is stale because it has been open 30 days with no activity.

github-actions[bot] avatar Apr 17 '22 20:04 github-actions[bot]