service-fabric icon indicating copy to clipboard operation
service-fabric copied to clipboard

Prometheus exporter for Monitoring SF cluster

Open naveenkumarsp opened this issue 3 years ago • 5 comments

Is there any exporter developed by any community though which we can monitor the cluster health, states of service fabric applications and services?

I am still not clear on how we can monitor the cluster and its underneath infrastructure. I was considering to use windows exporter for node metrics and planning to write an exporter for cluster side metrics based on API.

Any suggestions and recommendations will be appreciated.

naveenkumarsp avatar Mar 19 '21 06:03 naveenkumarsp

@naveenkumarsp Have you read https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-best-practices-monitoring?We offer guidance on how to monitor infrastructure, cluster and app in above doc for Windows and Linux.

  • How to set up Windows Azure diagnostics agent https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-event-aggregation-wad
  • How to consume events/data from logs and set up monitors: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-oms-setup

Please read the above doc and let us know if the information you are looking for is not available in doc.

athinanthny avatar Apr 01 '21 21:04 athinanthny

thanks for replying.

As ops guy, I have been looking a way to monitor the applications and services hosted on the SF cluster. I would want to get notified when any issues or error occurs in the cluster.

Would WAD and diagnostic agent help me to achieve it?

naveenkumarsp avatar Apr 07 '21 08:04 naveenkumarsp

You should consider a watchdog service, as mentioned in the documentation above. FabricObserver will generate health warnings at the service (as ApplicationHealthReports) and node (VM) level (as NodeHealthReports) when things get into a bad state, where you define what the things are and what bad means. Today, out of the box, these things are machine resource metrics at process and VM level. Unlike monitoring services, FO provides port use information which is critical if you run services that eat TCP ports as part of their daily diet.

https://aka.ms/sf/FabricObserver

GitTorre avatar Apr 09 '21 20:04 GitTorre

Any updates here guys?

ASalihov avatar Oct 02 '23 08:10 ASalihov

I made PoC on Prometheus exporter using Python in which metrics were fetched over API and transposed as Prometheus. If many are interested, we may can develop an exporter.

naveenkumarsp avatar Oct 06 '23 16:10 naveenkumarsp