Add Audit Log entry for the Instance when breaching 75% capacity
Description
How do we prevent this looping as an alert? If they're consistently running at that level
Epic/Story
No response
Have you provided an initial effort estimate for this issue?
I have provided an initial effort estimate
Hey Joe.
Following the title of this issue Add Audit Log entry for the Instance when breaching 75% capacity and my initial assessment that there was not one - i was incorrect. My dev env did not have necessary settings in FORGE_CPU_LIMIT and FORGE_MEMORY_LIMIT and so log entries were not generated. I have now applied a setting in my Env Vars and noted the log entry. It appears in the instance audit logs:
and due to recent work on audit logs, appears in parent log (team log)
Regarding the comment in the description How do we prevent this looping as an alert? If they're consistently running at that level:
- When the alert occurs it is interlocked. This interlock is not reset until 30 samples (5 mins) of the average value being less than 75%. Note, due to averaging, this effectively takes longer than 5 mins (due to the averaging buffer holding 30 samples)
However, this would mean if the average was mostly >= 75%, there would be no re-reporting of this. i.e. it may have happened weeks ago and has been forgotten / missed.
Possible approaches might want to consider include:
- Log again after every 24h? Week?
- Log again every 5% increase?
- Set the status pill to show in orange or have a high thermometer icon?
- like this mock up
(only better)
- like this mock up
Moving this to backlog, as most items in the parent were covered, and this was already a feature we had. Improvement was proposed, but not as urgent as the core feature itself.