ceph-nvmeof icon indicating copy to clipboard operation
ceph-nvmeof copied to clipboard

Removing nvmeof service doesn't deletes OMAP entries

Open sunilkumarn417 opened this issue 1 year ago • 4 comments

Noticed that OMAP entries for GW entities like subsytem to namespaces are still exists even after removing the entire service from the cluster.

Steps to follow:

  • deploy nvmeof service with single pool.
  • Add all required entities to GW using nvmeof-cli from subsystem to namespaces.
  • Now Observe the nvme GW entities in the OMAP nvme.None.state.
  • Delete the nvmeof service.
  • And user can stilll notice the GW entities in OMAP.
[ceph: root@ceph-1sunilkumar-ol18l6-node1-installer /]# ceph orch ls
NAME                       PORTS        RUNNING  REFRESHED  AGE  PLACEMENT  
alertmanager               ?:9093,9094      1/1  8m ago     5d   count:1    
ceph-exporter                               6/6  8m ago     5d   *          
crash                                       6/6  8m ago     5d   *          
grafana                    ?:3000           1/1  8m ago     5d   count:1    
mgr                                         2/2  8m ago     5d   label:mgr  
mon                                         3/3  8m ago     5d   label:mon  
node-exporter              ?:9100           6/6  8m ago     5d   *          
osd.all-available-devices                    16  5m ago     5d   *          
prometheus                 ?:9095           1/1  8m ago     5d   count:1 

[ceph: root@ceph-1sunilkumar-ol18l6-node1-installer /]# ceph orch ps | grep nvme
[ceph: root@ceph-1sunilkumar-ol18l6-node1-installer /]# 


[ceph: root@ceph-1sunilkumar-ol18l6-node1-installer /]# rados -p rbd listomapkeys nvmeof.None.state
host_nqn.2016-06.io.spdk:test_cli_*
listener_nqn.2016-06.io.spdk:test_cli_client.nvmeof.rbd.ceph-1sunilkumar-ol18l6-node5.mnoqha_TCP_10.0.211.131_4420
listener_nqn.2016-06.io.spdk:test_cli_client.nvmeof.rbd.ceph-1sunilkumar-ol18l6-node5.mnoqha_TCP_10.0.211.32_4420
listener_nqn.2016-06.io.spdk:test_cli_client.nvmeof.rbd.ceph-1sunilkumar-ol18l6-node5.mnoqha_TCP_10.0.211.32_4421
listener_nqn.2016-06.io.spdk:test_cli_client.nvmeof.rbd.ceph-1sunilkumar-ol18l6-node6.ueawqa_TCP_10.0.211.22_4420
listener_nqn.2016-06.io.spdk:test_cli_client.nvmeof.rbd.ceph-1sunilkumar-ol18l6-node6.ueawqa_TCP_10.0.213.158_4420
namespace_nqn.2016-06.io.spdk:test_cli_2
omap_version
qos_nqn.2016-06.io.spdk:test_cli_2
subsystem_nqn.2016-06.io.spdk:test_cli

sunilkumarn417 avatar Jan 17 '24 08:01 sunilkumarn417

I agree that when we remove a GW through ceph adm we should remove all the GW specific state in the OMAP. I would not remove the entire OMAP when the last GW of a GW group is deleted but would introduce another command that explicitly allows removing a GW group.

PepperJo avatar Jan 23 '24 15:01 PepperJo

Isn't the way the service is deleted a behaviour determined by cephadm? If so, this issue needs to be raised under ceph/ceph for discussion with the cephadm maintainers right? For example, cephadm implements the nvmeof via a class which already has a post_remove method ... but it's empty :)

pcuzner avatar Jan 24 '24 01:01 pcuzner

yes @pcuzner we do need to involve cephadm, but in this discussion we try to agree about the expected behavior. Also, I think that the post_remove for example, will need some kind of a CLI to perform the required cleanup.

caroav avatar Jan 24 '24 07:01 caroav

I'm not clear on the CLI requirement for cleanup. For example, if the service is removed with --force (i.e. ceph orch rm nvmeof.gw1 --force) the mgr could just delete the rados objects (the class has both post_remove and purge methods).

pcuzner avatar Jan 24 '24 21:01 pcuzner