ceph-iscsi
ceph-iscsi copied to clipboard
do not upgrade config to backend storage in api/config
Fix #213
@dillaman ping for review, thanks!
The situation is: when we run update target/client with multiple threads, at the same time there are api/config calls.
The metadata update at api/config side will overide the metadata that need to be changed, but not write to DB.
The scenario I find is: When we unexport + unregister the disk AA, remove it from gw, and remote owner from it. The next step is to delete the volume. Concurrently we have assigned more disks to the client and trigger the update DB, update target/client. When we delete it, the metadata has shown that the volume is still mapped to a special client. That means that the metadata update has conflicts.
@dillaman is there a method to slow down the frequency of api/config calls? If so I think the situation will be better.
Hmm. It's almost sounding like any API methods that will update the config should first grab the RADOS lock on the config via a with
clause to ensure only one GW can update the config concurrently. I think that's a much larger change but it would resolve any race conditions.
@dillaman Sorry for late response. I wonder why api/config method need to refresh the backend DB? Looks the method is not a "GET" and just get config from DB and then commit the config to DB.
@dillaman Sorry for late response. I wonder why api/config method need to refresh the backend DB? Looks the method is not a "GET" and just get config from DB and then commit the config to DB.
I think it's because another GW API might have updated something so it can race w/ the true state. We just need to do a better job w/ the locking of the config object.
@dillaman Yeah it sounds like the problem. Do you mean every changing of the "config object " and write to backend need to be locked?
The Config
class already has support for globally locking the on-RADOS config object via [1], but the updating of the config is currently a piecemeal operation (i.e. update parts and then commit). Instead, I would recommend adding __enter__
/__exit__
methods to lock/unlock the config and make sure any reads or writes the config are only down via context manager with ...
blocks.
[1] https://github.com/ceph/ceph-iscsi/blob/master/ceph_iscsi_config/common.py#L445