ceph-nvmeof icon indicating copy to clipboard operation
ceph-nvmeof copied to clipboard

NVMe-oF multipathing: allow to set ANA

Open PepperJo opened this issue 1 year ago • 5 comments

Add gRPC to allow to set ANA state of a path. Setting the ANA state is supported in SPDK through the nvmf_subsystem_listener_set_ana_state RPC. The gRPC should save the ANA state in the OMAP config (together with the listener config) and call the SPDK RPC.

PepperJo avatar May 08 '23 15:05 PepperJo

I started to work on this task In order to allow ANA configuration for spdk need to define some changes in existed cli commands

  1. create_subsystem cli : to add parameter called ana_reporting < boolean true/false >
  2. add_namespace cli : to add parameter called anagrpid <integer number 1-32>
  3. set_listener_ana_state - new cli argument("-n", "--subnqn", help="Subsystem NQN", required=True), argument("-g", "--gateway-name", help="Gateway name", required=True), argument("-t", "--trtype", help="Transport type", default="TCP"), argument("-f", "--adrfam", help="Address family", default="ipv4"), argument("-a", "--traddr", help="NVMe host IP", required=True), argument("-s", "--trsvcid", help="Port number", required=True), argument("-i", "--anagrpid", help="ANA group ID", type=int), argument("-p", "--ana_state", help="ANA state ", required=True) ana_state = "Inaccessible"/ "Optimized"/NonOptimized"

leonidc avatar Sep 12 '23 11:09 leonidc

I think that the name of the command "set_listener_ana_state" is miss leading, maybe should be called "set_listener_ana_state_for_grp_id" - because it is really doing it for a particular group id. Also I think that we might consider doing this command per GW and not per IP. I.e. we can look for all listeners defined on this GW and change it for all. It doesn't make sense to change it to one listener only.

caroav avatar Sep 12 '23 13:09 caroav

Development task for this issue is greatly dependent on approach chosen for https://github.com/ceph/ceph-nvmeof/issues/193 The concept discussed and approved is to use separate ANA states per namespace as one of instruments that guarantee that host does not perform writes to several gateways at the same time.
As a consequence all created namespaces would be divided to several groups(max 32) and each gateway configures as Optimized only 1 group , other groups are set to Inaccessible state.

So giving the above approach (assuming it is correct and would be implemented) looking at the new cli set_listener_ana_state

  1. it seems better implementation of this cli is that python code passes all created listeners and sets them the same ANA state .
  2. when the new listener is created and ANA reporting is enabled, the spdk sets the default ANA state for all its ANA groups as Optimized. This is not good - better to change it to Inaccessible.

Cli add_namespace - when optional parameter anagrpid is not set, the spdk by default sets anagrpid = nsid. When ANA reporting is enabled in the subsystem, the parameter anagrpid should be required for added namespaces - change in the cli.py

leonidc avatar Sep 19 '23 09:09 leonidc

The new Automatic_failover mode would be add upon configuring of the new subsystem. In this mode the below behavior is supported: The Optimized ANA group ID per GW would be read directly from the configuration file . GW during its startup sequence registers with this parameter against the new NvmefGWMonitor in the context of the ceph_mon. Later NvmefGWMonitor enables IO traffic per GW - it sends the message to the GW back with request to configure the Optimized ANA group for all subsystem's listeners. GW after accept of the message directly invokes the RPC nvmf_subsystem_listener_set_ana_state. So actually the CLI set_listener_ana_state isn't need.

But one additional thing need to be addressed in the Automatic_failover mode. When new listener is created in spdk in the subsystem with enabled ANA reporting - by default this listener updates the host with ANA state Optimized for all ANA groups. (all Namespaces would be reported to host as Optimized)It means that host may immediately open the IO traffic through all configured namespaces for all multiple paths.

leonidc avatar Oct 10 '23 13:10 leonidc

@leonidc I think the "ana_state" parameter should be renamed to "ana-state" (with a hyphen) to be consistent with other parameter names.

gbregman avatar Oct 10 '23 14:10 gbregman