ceph-nvmeof icon indicating copy to clipboard operation
ceph-nvmeof copied to clipboard

Async change notifications from discovery controller

Open sdpeters opened this issue 1 year ago • 1 comments

As mentioned in https://github.com/ceph/ceph-nvmeof/issues/63#issuecomment-1414349733, it's desirable for the gateway discovery service to support async change notifications. NVMe calls these async events.

That enables hosts that maintain persistent connections to the discovery controller to get prompt notification when a gateway config change alters the list of ports that the discovery controller returns to the host.

I'm creating a separate issue for this feature of the gateway discovery service so we can schedule that feature separately (e.g. in the first release with discovery, or some later update).

It appears there's some investigation needed to find a host that can use these (and which we can use to test this). Specifically it's not clear that Linux uses this capability. The "nvme connect-all" command will perform discovery once, and connect to all those ports. The expectation seems to be that hosts will do this once when they boot. One old Mellanox support article describes setting up a systemd to do this on boot repeat it periodically (poll the discovery controller). Polling would work, but isn't ideal.

Both the "nvme connect-all" and "nvme discover" commands have a "--persistent" option to retain the discovery controller. A persistent connection is required for a host to receive NVMe async events. The --persistent option seems to imply a local /dev/nvme device for that controller will continue to exist. It's not clear that will set up async event notifications to the host, or act on async notifications (e.g. trigger a udev action to repeat the "connect-all" process) in the host if they arrive.

sdpeters avatar Mar 02 '23 19:03 sdpeters

I found this kernel patch from early 2021 that adds a new "nvme monitor" command to do just what we want here (monitor discovery controllers for changes, and promptly act on them). IDK if this was merged, and haven't dredged up the lkmk conversation surrounding this patch request.

sdpeters avatar Mar 02 '23 19:03 sdpeters