ceph-nvmeof icon indicating copy to clipboard operation
ceph-nvmeof copied to clipboard

Gateway statistics collection

Open sdpeters opened this issue 2 years ago • 2 comments

This feature describes statistics collected from gateways in general. This may include stats from the discovery service. Some of these may be for performance monitoring (I/Os or Bytes/second per host, and namespace), some may be for behavior monitoring (which hosts are connected to which gateways with which transports, and attached to which namespaces on each port), and some may be for fault identification (host disconnect/reconnect rate, source IP and/or host ID of failed connect/auth attempts).

We may be able to avoid an active monitor for the gateway processes (something that makes NVMe-oF connections to all of them and verifies their behavior) if the stats exposed by each gateway collectively measure everything an active monitor would.

sdpeters avatar Feb 07 '23 18:02 sdpeters

See https://github.com/ceph/ceph-nvmeof/issues/63#issuecomment-1421280054

sdpeters avatar Feb 07 '23 18:02 sdpeters

Stats from the SPDK NVMe-oF targets seem to be covered by #37, but that might not include the discovery services, whatever glue logic is required to enable all these Prometheus metric servers to be discovered and polled by the cluster's central stats mechanism, and the logic (probably a Grafana dashboard) to present the aggregate stats from all the gateway daemons.

Together all these things should satisfy the requirement in #116.

sdpeters avatar Apr 28 '23 21:04 sdpeters