foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Visualization for recruited roles in a cluster

Open alexmiller-apple opened this issue 4 years ago • 2 comments

In digging into unhealthy clusters, I seem to spend a lot of time trying to figure out which pieces of the system are communicating with which other pieces of the system. Figuring out which Transaction log is the preferred location for which tag, or which log router is pulling from which TLog, or what generation a log router belongs to all slows down the debugging process. It'd be great to just have a way to get the IDs of recruited roles and arrows of how they're connected quickly.

It looks like if we had the raw information available, then graphviz lets us output a declarative description of the cluster, and it will handle the rendering. Subgraph clusters allow one to put a labelled box around a collection of nodes. HTML-like labels allow attaching tables of information to a process.

For example:

digraph FDB {
  subgraph cluster_generation_1 {
    label = "Generation 1";
    subgraph cluster_DC1 {
      label = "DC1"
      Proxies [shape=plaintext
               label=<<table>
                        <tr><td>Proxy0:</td><td>af9380</td></tr>
                        <tr><td>Proxy1:</td><td>dc8947</td></tr>
                        <tr><td>Proxy2:</td><td>eb8127</td></tr>
                       </table>>]
      Log0 [ shape=plaintext
             label=<<table><tr><td>ID:</td><td>aa827</td></tr></table>>]
    }
    subgraph cluster_DC2 {
      label = "DC2"
      subgraph cluster_SatelliteLogs {
        label = "Satellite Logs"
        SLog0 [shape=plaintext
               label=<<table><tr><td>ID:</td><td>192347</td></tr></table>>]
      }
    }
    subgraph cluster_DC3 {
      label = "DC3"
      subgraph cluster_LogRouter {
        label = "LogRouters"
        LR0 [shape=plaintext
             label=<<table><tr><td>ID:</td><td>deof75</td></tr></table>>]
      }
    }
    Proxies -> SLog0;
    Proxies -> Log0;
    SLog0 -> LR0;
  }
}

becomes

image

However, the raw information doesn't seem to be readily available. Things related to transaction logs could probably be read from coordinated state, but other things, like the IDs of recruited log router instances and what generation they belong to, doesn't exist in an easy to access place right now.

alexmiller-apple avatar May 02 '20 21:05 alexmiller-apple