DIRAC
DIRAC copied to clipboard
Cannot update CS after finding dead CS worker
When updating the CS from the WebApp, we occasionally get
ERROR: ERROR: AutoMerge failed: Could not AutoMerge. Could not retrieve original committer's version
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
work_item.run()
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/private/Service.py", line 349, in _processInThread
result = self._processProposal(trid, proposalTuple, handlerObj)
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/private/Service.py", line 536, in _processProposal
result = self._executeAction(trid, proposalTuple, handlerObj)
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/private/Service.py", line 556, in _executeAction
response = handlerObj._rh_executeAction(proposalTuple)
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/RequestHandler.py", line 120, in _rh_executeAction
retVal = self.__doRPC(actionTuple[1])
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/RequestHandler.py", line 251, in __doRPC
return self.__RPCCallFunction(method, args)
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/RequestHandler.py", line 292, in __RPCCallFunction
uReturnValue = oMethod(*args)
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/ConfigurationSystem/Service/ConfigurationHandler.py", line 71, in export_commitNewData
return gServiceInterface.updateConfiguration(sData, credDict["username"])
File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py", line 219, in updateConfiguration
return S_ERROR(f"AutoMerge failed: {result['Message']}")
This is due to the CS not finding a correct backup in https://github.com/DIRACGrid/DIRAC/blob/c4b7a6e009e03570cecfff2b8499356d6e9551e7/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py#L267
This method finds the latest backup by looking at the zip files containing the date found in the client's DIRAC/Configuration/Version.
This version is distributed by the client by the CS, so there's no real reason it would be wrong.
Except when a slave is found dead.
In that case, a new version is generated:
@400000006756768d1d106ff4.s-57426-2024-12-09 04:46:50 UTC Configuration/Server [140072925378112] WARN: Found dead slave dips://speen.nikhef.nl:9135/Configuration/Server
@400000006756768d1d106ff4.s:57428:2024-12-09 04:46:51 UTC Configuration/Server [140072925378112] INFO: Generated new version 2024-12-09 04:46:51.020183
But this version is never actually committed (and we do not want to). So there's no backup file corresponding to that date.