[Monitoring] APM Server/Integration Stack Monitoring lags behind in terms of metrics visualized
Currently, the APM Server stack monitoring page is falling behind in terms of metrics which can be visualized and reported to the user.
It would be nice to ship some dashboards with the APM Server/Integration which can work for any kind of deployment (on-premise APM Server, on-premise APM Integration, Integration Server on ECH) which attempt to replicate the Stack Monitoring page, but with a Dashboard.
The Dashboard can be shipped by the input-only APM Integration in Fleet.
- Intake
- IntakeV2 traces, metrics, logs
- OTLP(grpc) traces, metrics, logs
- OTLP(http) traces, metrics, logs
- Jeager
- Central Config requests
- Intake Request outcome
- Success / Failure
- Failure reason
- Status codes
- TBS
- Disk limit
- Disk limit reached
- Current disk usage
- Output Request outcome
- Successful / Failures
- Failure reason
- Status codes
- Buffered events
- Infra
- CPU
- Memory usage
- Throttling
That would fit perfectly to the Sunsetting Stack Monitoring roadmap 👍
I'm actually making a prototype 😅 I'll share here.
It works on ECH with metric shipping enabled. We would need to validate it also when using MB monitoring of APM Server & APM Integration.
It's still missing the cpu/memory stats & output stats.
cc @mlunadia and @raultorrecilla for awareness and prioritization