coldfront icon indicating copy to clipboard operation
coldfront copied to clipboard

Slurm plugin: Update default slurm account

Open dsajdak opened this issue 4 years ago • 1 comments

If a user is under multiple accounts and you try to remove their access that is also their default slurm account, it will fail. You have to modify the user to change the default account, then remove the access to the other account. If this isn't possible to automate, then it needs to be logged and viewable for the sys admin to handle before re-running the slurm sync tool. https://github.com/ubccr/coldfront/issues/273

dsajdak avatar Mar 18 '21 18:03 dsajdak

A user has associations under more than one account:

sacctmgr show user ccrgst72 -s list format=user,defaultaccount,account,cluster,qos%45
      User   Def Acct    Account    Cluster                                           QOS
---------- ---------- ---------- ---------- ---------------------------------------------
  ccrgst72 ccrgsttest   ccrgst72      alpha    debug,general-compute,normal,scavenger,viz
  ccrgst72 ccrgsttest ccrgsttest      alpha    debug,general-compute,normal,scavenger,viz

When that user is removed from the allocation that is the default slurm account we get an error:

COLDFRONT_ENV=.env coldfront slurm_check -c alpha -a ccrgsttest -s -x
Syncing Slurm with ColdFront
username        account cluster slurm_action    slurm_specs
Slurm command failed: /util/software/ubuntu/slurm/current/bin/sacctmgr -Q -i delete user where name=ccrgst72 cluster=alpha account=ccrgsttest
Failed removing Slurm association user ccrgst72 account ccrgsttest cluster alpha: return_value=1 stdout=b' Deleting user associations...\n  C = alpha      A = ccrgsttest U = ccrgst72 \n' stderr=b' User ccrgst72 on cluster alpha no longer has a default account.\n You must change the default account of these users or remove the users completely from the affected clusters to allow these changes.\n Changes Discarded\n'
ccrgst72        ccrgsttest      alpha   Remove
Slurm command failed: /util/software/ubuntu/slurm/current/bin/sacctmgr -Q -i delete account where name=ccrgsttest cluster=alpha
Failed removing Slurm account ccrgsttest cluster alpha: return_value=1 stdout=b'' stderr=b" Users listed below have these as their Default Accounts.\nC = alpha      A = ccrgsttest           U = ccrgst72 \n Please either remove the accounts listed above from list and resubmit,\n or change these users' default accounts to remove the account(s).\n Changes Discarded\n"
        ccrgsttest      alpha   Remove

Currently, I manually modify the default account with: sacctmgr modify user <username> set DefaultAccount=<account> Then re-run the above slurm_check command.

Sometimes this can get even more complicated if they have different default accounts for different clusters. You'd then have to specify: sacctmgr modify user <username> set DefaultAccount=<account> where cluster=<cluster>

dsajdak avatar Feb 10 '23 19:02 dsajdak

Fixed in #597

aebruno avatar May 09 '24 11:05 aebruno