linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

Resources not updating after removing vis linstor node lost

Open kvaps opened this issue 7 years ago • 5 comments

After linstor node lost command the resource still will have configured connection to the dead node on the alive nodes. No way for remove those connections from alive without touching resource.

exampleres role:Primary
  disk:UpToDate
  node1 role:Secondary
    peer-disk:UpToDate
  node2 connection:Connecting
  node3 connection:Connecting

kvaps avatar Oct 02 '18 07:10 kvaps

I could not reproduce the issue:

$ linstor n c bravo <ip>
$ linstor n c charlie <ip>
$ linstor n c delta <ip>

$ linstor rd c tesrsc
$ linstor vd c testrsc 10m

$ linstor sp c lvm bravo testsp lspool
$ linstor sp c lvm charlie testsp lspool
$ linstor sp c lvm delta testsp lspool

$ linstor r c bravo charlie delta testrsc -s testsp

# kill satellite on delta
$ linstor n lost delta

$ ssh bravo drbdadm status
testrsc role:Secondary
  disk:UpToDate
  charlie role:Secondary
    peer-disk:UpToDate

$ ssh charlie drbdadm status
testrsc role:Secondary
  disk:UpToDate
  bravo role:Secondary
    peer-disk:UpToDate

$ ssh delta drbdadm status
testrsc role:Secondary
  disk:UpToDate
  bravo connection:Connecting
  charlie connection:Connecting

That delta is still in connecting state is expected, as I killed linstor-satellite on that node so it was not able to clean that resource up. However, the other two nodes do not try to connect to delta anymore.

Have you missed some ErrorReports? linstor err list?

ghernadi avatar Oct 02 '18 07:10 ghernadi

I think it may be connected with https://github.com/LINBIT/linstor-server/issues/19 I still have resources which are stuck on DELETING state.

command linstor err l are stack too because of that. After restarting controller command is working, but my alive resources changed to Unknown state

kvaps avatar Oct 02 '18 08:10 kvaps

agree, problem solved after restarting linstor-controller and linstor-satellites all together.

Seems some loop was stuck on some operations in linstor-controller because of https://github.com/LINBIT/linstor-server/issues/19

when I was restarting only linstor-satellites before, it wasn't help me

kvaps avatar Oct 02 '18 08:10 kvaps

Do I understand you correctly that after you did a linstor node lost <node name> linstor still has (or now after restarting everything "had") resources of that lost node in DELETING state?

ghernadi avatar Oct 02 '18 08:10 ghernadi

No, I've removed a lot of resources in one time, they are stuck on DELETING, and I was tried to solve problem via linstor node lost, but on small amount of nodes for beginning.

So I was have another nodes with the DELETING resources, not exactly this one, which I lost.

kvaps avatar Oct 02 '18 08:10 kvaps