ceph-iscsi icon indicating copy to clipboard operation
ceph-iscsi copied to clipboard

do not upgrade config to backend storage in api/config

Open kevinzs2048 opened this issue 4 years ago • 8 comments

Fix #213

kevinzs2048 avatar Aug 27 '20 12:08 kevinzs2048

@dillaman ping for review, thanks!

kevinzs2048 avatar Aug 27 '20 12:08 kevinzs2048

The situation is: when we run update target/client with multiple threads, at the same time there are api/config calls.

The metadata update at api/config side will overide the metadata that need to be changed, but not write to DB.

The scenario I find is: When we unexport + unregister the disk AA, remove it from gw, and remote owner from it. The next step is to delete the volume. Concurrently we have assigned more disks to the client and trigger the update DB, update target/client. When we delete it, the metadata has shown that the volume is still mapped to a special client. That means that the metadata update has conflicts.

kevinzs2048 avatar Aug 27 '20 13:08 kevinzs2048

@dillaman is there a method to slow down the frequency of api/config calls? If so I think the situation will be better.

kevinzs2048 avatar Aug 27 '20 13:08 kevinzs2048

Hmm. It's almost sounding like any API methods that will update the config should first grab the RADOS lock on the config via a with clause to ensure only one GW can update the config concurrently. I think that's a much larger change but it would resolve any race conditions.

dillaman avatar Aug 28 '20 01:08 dillaman

@dillaman Sorry for late response. I wonder why api/config method need to refresh the backend DB? Looks the method is not a "GET" and just get config from DB and then commit the config to DB.

kevinzs2048 avatar Sep 11 '20 07:09 kevinzs2048

@dillaman Sorry for late response. I wonder why api/config method need to refresh the backend DB? Looks the method is not a "GET" and just get config from DB and then commit the config to DB.

I think it's because another GW API might have updated something so it can race w/ the true state. We just need to do a better job w/ the locking of the config object.

dillaman avatar Sep 11 '20 13:09 dillaman

@dillaman Yeah it sounds like the problem. Do you mean every changing of the "config object " and write to backend need to be locked?

kevinzs2048 avatar Sep 11 '20 14:09 kevinzs2048

The Config class already has support for globally locking the on-RADOS config object via [1], but the updating of the config is currently a piecemeal operation (i.e. update parts and then commit). Instead, I would recommend adding __enter__/__exit__ methods to lock/unlock the config and make sure any reads or writes the config are only down via context manager with ... blocks.

[1] https://github.com/ceph/ceph-iscsi/blob/master/ceph_iscsi_config/common.py#L445

dillaman avatar Sep 11 '20 18:09 dillaman