fence_kdump: monitor action does not work correctly.
fence_kdump monitor action checks local node only and does not checks target node. (It is described in a commit log "monitor action checks if LOCAL node can enter kdump")
It makes no sense because fence_kdump have to check target node configuration. And it is difficult to check target node without ssh or other remote shell command.
I have no ideas to resolve this issue. Anyone have ideas?
It makes sense for cluster scenarios, if fence_kdump is configured then it is configured on every node. So, checking it locally makes sense as every target node will be checked that way.
I understand your concerns but imho there is not a real solution because there is nothing running before problem occurs. Also, we test just kernel args but it would be better to check if fence_kdump_send is contained in kdump kernel and if it will be executed.
OK, I understand that fence_kdump resource must start on every node to cover all cluster members. And administrator have to be careful that fence_kdump monitor error means local node error, it is not target node error(It is different from other fence agents behavior).
It seems that 1+1 cluster is no ploblem, but N+M cluster have to take care about location constraints. Because there is no guarantee that fence agents are distributed equally throughout the cluster and it need to prohibit fence_kdump resource fail over.
yes, it is different than other agents.
But I'm not sure where you see problem with bigger clusters. You should have fence_kdump on every node, so there is no issue at all. If (for whatever reason) you do not want to have fence_kdump install on particular node, just set '' pcmk_monitor_action="metadata" '' like it was before.
We configure STONITH resources as a group resource like the follonwing.
node1: grpSTONITH_node2(fence_kdump_node2 + ipmi_node2) node2: grpSTONITH_node3(fence_kdump_node3 + ipmi_node3) node3: grpSTONITH_node1(fence_kdump_node1 + ipmi_node1)
If node2 crash, then grpSTONITH_node3 fail over to node1. (grpSTONITH_node3 can not run on the node3) After resotore node2, grpSTONITH_node3 keep running on node1 and fence_kdump can not checks restored node2.
node1: grpSTONITH_node2 grpSTONITH_node3 node2: none node3: grpSTONITH_node1
It is no problem with other fence_agents because they can run on every nodes except target node. Administrator take no care about location of fence_agents. But fence_kdump is not.
This is why N+M cluster have to take care about location constraints. fence_kdump needs auto fail back configuration or manually fail back operation.
I'm planning to configure 3 nodes cluster with fence_kdump and fence_ipmilan. This configuration is the same as knakahira described above. What did you come up with to solve this problem?