dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

Dashboard should detect and highlight communication issues with components like the statestore

Open hanzvanaardt opened this issue 4 years ago • 3 comments

I recently spent hours trying to figure out the reason why my Dapr deployment was not functioning. The issue ended being simply that the statestore and pub-sub components were totally not functioning due to misconfiguration. I had to dig through the logs on the dapr sidecart's to get this information - it would have been incredibly valuable if the dashboard detected the issue with the components and highlighted that.

hanzvanaardt avatar Aug 26 '20 00:08 hanzvanaardt

Hey @hanzvanaardt, the dashboard can let you filter the logs for a dapr sidecar to only display errors. would you like something more proactive than that?

yaron2 avatar Sep 30 '20 00:09 yaron2

This can be an interesting issue, although it would be hard for dashboard to identify a component causing daprd not to start up if daprd is not running. In K8s, it might be easier if we inspect the daprd logs.

To fix this, I propose the following:

  1. Change daprd to start up instead of crashing when a required component does not start but don't serve any of the APIs.
  2. The metadata API should still be accessible and surface a number of different health statuses and metrics, including components that failed to load.
  3. Daprd reports unhealthy when required components don't load, so K8s can restart container as designed.
  4. Change dashboard to probe each running sidecar metadata API to consolidate the health signals.

artursouza avatar Mar 22 '21 23:03 artursouza

This can be an interesting issue, although it would be hard for dashboard to identify a component causing daprd not to start up if daprd is not running. In K8s, it might be easier if we inspect the daprd logs.

To fix this, I propose the following:

  1. Change daprd to start up instead of crashing when a required component does not start but don't serve any of the APIs.
  2. The metadata API should still be accessible and surface a number of different health statuses and metrics, including components that failed to load.
  3. Daprd reports unhealthy when required components don't load, so K8s can restart container as designed.
  4. Change dashboard to probe each running sidecar metadata API to consolidate the health signals.

In the past we let daprd ignore failing components by default and continue to init, and got a lot of feedback that the sidecar should crash.

For that purpose, we enabled the ignoreErrors field on a Component so that users can explicitly let the sidecar continue if a component fails to init.

yaron2 avatar Mar 22 '21 23:03 yaron2