Create "Monitoring best practices"

Open ramonsmits opened this issue 1 year ago • 0 comments

Currently our monitoring docs page focusses on performance metrics which are stored and visualized in our platform via ServicePulse and ServiceControl.

However, there are more angles on "monitoring" as the term is not really specific on what needs to be monitored.

Server, hosting, resource metrics/monitoring
Application metrics/monitoring
Tracing
Logging
Auditing

The following information would be very useful to our customer. A single page that lists all the monitoring related aspects. I think the best place is currently our Monitoring page.

Server / Hosting / Resources metrics/monitoring

There is server/resources metrics and application metrics. On top of that we have auditing to capture the history of messages processed in a system to diagnose issues.

Server resource metrics like CPU, network IO, storage IO, free RAM, database utilization, database latency, storage IOPS, etc are metrics outside of our platform. Very often there are already metrics in place for many of these in cloud providers or can be enabled with a few simple scripts/clicks.

Infrastructure monitoring

Our platform does not have any alerting capabilities. Customers need to setup their own monitoring on messaging infrastructure and setup alerting thresholds on:

Number of messages in all queues
Number of messages in a single queue
Storage quota used for all queues
Storage quote free space remaining for all queues
Monitor dead-letter queues (not supported by ServiceControl)

Customers need to define thresholds for these metrics. Then create alerts that are triggered when the thresholds are exceeded. The alerts can be integrated with monitoring suites like Solarwinds, New Relic, SCOMM, Azure Monitor / Application Insights or similar tools.

Application metrics / monitoring

NServiceBus endpoints also expose application metrics related to message throughput and processing performance.

https://docs.particular.net/monitoring/
https://docs.particular.net/monitoring/metrics/
https://docs.particular.net/monitoring/metrics/install-plugin
https://docs.particular.net/monitoring/metrics/in-servicepulse

Our platform only has in-memory non-durable history of 1 hour. If you need more history than you need to expose these metrics to a metrics solution line;

https://docs.particular.net/samples/logging/

Tracing

We also support tracing via OpenTelemetry. This can give you very details insights into your application performance.

https://docs.particular.net/nservicebus/operations/opentelemetry
https://docs.particular.net/samples/open-telemetry/

Auditing

Auditing needs to be enabled in NServiceBus endpoints.

https://docs.particular.net/nservicebus/operations/auditing

Logging

For logging we recommend to use Microsoft.Extensions.Logging and use a logging target that suits your needs.

https://docs.particular.net/nservicebus/logging/

Jan 30 '24 12:01 ramonsmits