ceph-nvmeof
ceph-nvmeof copied to clipboard
Gateway statistics collection
This feature describes statistics collected from gateways in general. This may include stats from the discovery service. Some of these may be for performance monitoring (I/Os or Bytes/second per host, and namespace), some may be for behavior monitoring (which hosts are connected to which gateways with which transports, and attached to which namespaces on each port), and some may be for fault identification (host disconnect/reconnect rate, source IP and/or host ID of failed connect/auth attempts).
We may be able to avoid an active monitor for the gateway processes (something that makes NVMe-oF connections to all of them and verifies their behavior) if the stats exposed by each gateway collectively measure everything an active monitor would.
See https://github.com/ceph/ceph-nvmeof/issues/63#issuecomment-1421280054
Stats from the SPDK NVMe-oF targets seem to be covered by #37, but that might not include the discovery services, whatever glue logic is required to enable all these Prometheus metric servers to be discovered and polled by the cluster's central stats mechanism, and the logic (probably a Grafana dashboard) to present the aggregate stats from all the gateway daemons.
Together all these things should satisfy the requirement in #116.