button to disable HR DB check per account
For the LHC experiments, IAM depends on the CERN HR DB to decide if a person has the right to VO membership, and on the experiment secretariats to ensure members of their experiment have the correct status in that DB at any time. Unfortunately, there have often been mismatches for people whose contracts recently changed, often lasting for several days, leading to cases for the VO admins, the IAM service team and the secretariats to investigate, and significant dissatisfaction for all parties concerned. Such cases are likely to keep happening at similar frequencies. It would thus be good if a VO admin had an easy way to override per account what the HR DB returns for the person in question: a button to disable the HR DB check for that account. There already is a button to restore or disable an account when needed.
We should be careful to do this in a way that means we don't end up with users' lifecycles being overridden indefinitely. Perhaps there should be an end date for the override.
Do we understand in a first place why CERN HR DB doesn't contain right information? Do we have a "picture" where we can see which systems are involved (does experiment secretariat write directly in CERN HR DB)? Should we synchronize state more often not to wait a day to see if system itself automatically recovers? Do we need push notifications instead of just simple periodic synchronization? Yes, these suspensions are quite annoying and VO administrators don't even have a way to directly check the state in the CERN HR DB...
Even if we had a full picture now, it could silently change, as we rely on third parties we have no real relationships with. The bottom line is that the chain is complex and can fail in mysterious ways, due to bugs, (mis)features, lack of communication, ambiguities or human errors, and we do not want users to remain stranded until things finally look OK in the HR DB...
The HR DB check is not done for the accounts that have the label hr.cern.ignore set, so we already have a mechanism in place, available via API. The example shows exactly how to set this label.
There is also an API endpoint to search users by label, which somewhat addresses @hshort concerns. Unfortunately this API doesn't work (sigh), but we can easily fix it for the next release.
We are reluctant to add additional widgets to the existing dashboard and prefer to postpone them to the new one under development, unless the workarounds described above are not deemed sufficient.
Ciao IAM devs, what is the timescale for the new dashboard to be deployable on our instances?
It's a few months away. I was thinking about presenting an alfa/beta version at the hackathon we'll have at CERN in February. The development happens here.
That means we would have to survive for another half year with the currently available functionality: hopefully it is tolerable for the affected experiments...
This delay is unfortunate and a bit annoying for admins who don't have HR DB access and need to continue to bug other people or groups that do (e.g., Secretariat) in order to fix IAM membership suspensions.
Could this be an item for the hackathon at the end of this month?
Of course it could be. But there is a general point here: we can't add a widget to the dashboard for any possible functionality available via API. In this specific case, is it really important to have a button on the dashboard instead of running a simple curl (which we can provide)?
It sounds a reasonable workaround. ATLAS experts to comment...
Are you suggesting that we should start to use hr.cern.ignore when account gets accidentally suspended by CERN HR synchronization? It would be even easier for us set this flag for all accounts to avoid problems with CERN. I have to admit I don't fully understand this ticket from beginning, pushing button to disable broken CERN HR DB synchronization is still waste of time, do we fully understand where these issues exactly come from and why it is not possible to fix them at the source? Why we have to wait 12h to see that CERN HR DB account is still in bad state? Could you propagate CERN HR DB account status right away without any delay?
As I described earlier, the chain towards a proper status in the HR DB is complex and can break for various reasons. I think it is unrealistic to expect this to be improved any time soon and therefore expect such incidents to keep coming our way.
Case in point: a ticket opened asking for more information from the HR side could not be answered because someone was on holiday.
On the IAM side we need a way to compensate for such problems. For now, the API is the only way, but I would very much like to see a button in the GUI for that.