loki
loki copied to clipboard
feat: parameterise the MaximumEventAgeInSeconds, LogGroupName, and IAMRoleName for lambda-promtail CloudFormation template
What this PR does / why we need it:
Which issue(s) this PR fixes:
It adds the parameter for specifying MaximumEventAgeInSeconds
for the lambda-promtail's EventInvokeConfig template.
Why do we need this?
Without specifying this, the default value is 21600
which is 6 hours. We have been facing a problem where our lambda-promtail gets throttled and cannot process cloudwatch logs fast enough.
This has an effect as the more throttled it is, the more delay the event got processed, and eventually the old messages will be too old for Loki, causing even more failure to the Lambda processing
server returned HTTP status 400 Bad Request (400): entry with timestamp 2024-04-22 08:10:31.966 +0000 UTC ignored, reason: 'entry too far behind, oldest acceptable timestamp is: 2024-04-22T08:21:38Z',: errorString
Once we got to this point, the only way to mitigate this is to discard the old events, and the only way to do it is to change the MaximumEventAgeInSeconds
As the current template does not allow this, we are making this PR so we can benefit from upstream fix.
Test
Using this version of the lambda-promtail.yaml
and put 100 seconds as MaximumEventAgeInSeconds
Checking the configuration of the new version, it's updated as it should
Special notes for your reviewer:
Checklist
- [ ] Reviewed the
CONTRIBUTING.md
guide (required) - [ ] Documentation added
- [ ] Tests updated
- [x] Title matches the required conventional commits format, see here
- [ ] Changes that require user attention or interaction to upgrade are documented in
docs/sources/setup/upgrade/_index.md
- [ ] For Helm chart changes bump the Helm chart version in
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PR - [ ] If the change is deprecating or removing a configuration option, update the
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR
I'll try to use this locally and share the results in the description once I have it.
This seems like a reasonable argument to the template, please ping me after you've tried it out :+1:
@cstyan I attached the result in the description. it seems to work. however, during the testing I also found that the log group is not a variable and it makes the creation failed. so, I have another change if it makes sense: https://github.com/grafana/loki/pull/12750
Let me know what do you think, Thank you!
hey @InsomniaCoder lambda-promtail itself is best effort maintained currently, and the terraform and cloudformation files were always meant to be examples more than official "use exactly this to deploy" files, but I think both of your changes are still simple enough that we can merge them
@cstyan noted! from my side with these two PRs and probably a parameterized of the IAM role's name then it should be able to be used directly from my side.
but anyways, as you see fit. if you think they are beneficial to be merged, let me know if you need anything else 😄
@InsomniaCoder can you please include all the portions you want to parameterize in this PR :+1:
@cstyan combined all the changes.
thank you
@InsomniaCoder thanks for your patience 👍
thanks!!