linstor-server
linstor-server copied to clipboard
DRBD not ready on Proxmox reboot
While booting Proxmox the ZFS command needs to long and VMs with drbd devices are not started.
Perhaps there should be second service which waits for populating the devies and proxmox should depend on this.
I get this on 2 hosts in a cluster:
ERROR REPORT 5F26E34B-7E463-000000
============================================================
Application: LINBIT® LINSTOR
Module: Satellite
Version: 1.7.3
Build ID: 246c81885be6cd343667aff3c54e026f52ad0258
Build time: 2020-07-22T13:22:31+00:00
Error time: 2020-08-02 16:02:06
Node: px3.cc.private
============================================================
Reported error:
===============
Description:
Failed to query 'zfs' info
Cause:
External command timed out
Additional information:
External command: zfs list -H -p -o name,used,volsize,type -t volume,snapshot
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'genericExecutor', Source file 'Commands.java', Line #121
Error message: Failed to query 'zfs' info
Call backtrace:
Method Native Class:Line number
genericExecutor N com.linbit.linstor.storage.layer.provider.utils.Commands:121
genericExecutor N com.linbit.linstor.storage.layer.provider.utils.Commands:64
genericExecutor N com.linbit.linstor.storage.layer.provider.utils.Commands:52
list N com.linbit.linstor.storage.utils.ZfsCommands:20
getZfsList N com.linbit.linstor.storage.utils.ZfsUtils:127
getInfoListImpl N com.linbit.linstor.storage.layer.provider.zfs.ZfsProvider:131
updateVolumeAndSnapshotStates N com.linbit.linstor.storage.layer.provider.AbsStorageProvider:166
prepare N com.linbit.linstor.storage.layer.provider.AbsStorageProvider:158
prepare N com.linbit.linstor.storage.layer.provider.StorageLayer:161
prepare N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:628
prepareLayers N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:259
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:127
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
run N java.lang.Thread:834
Caused by:
==========
Category: Exception
Class name: ChildProcessTimeoutException
Class canonical name: com.linbit.ChildProcessTimeoutException
Generated at: Method 'waitFor', Source file 'ChildProcessHandler.java', Line #133
Call backtrace:
Method Native Class:Line number
waitFor N com.linbit.extproc.ChildProcessHandler:133
syncProcess N com.linbit.extproc.ExtCmd:92
exec N com.linbit.extproc.ExtCmd:56
genericExecutor N com.linbit.linstor.storage.layer.provider.utils.Commands:80
genericExecutor N com.linbit.linstor.storage.layer.provider.utils.Commands:64
genericExecutor N com.linbit.linstor.storage.layer.provider.utils.Commands:52
list N com.linbit.linstor.storage.utils.ZfsCommands:20
getZfsList N com.linbit.linstor.storage.utils.ZfsUtils:127
getInfoListImpl N com.linbit.linstor.storage.layer.provider.zfs.ZfsProvider:131
updateVolumeAndSnapshotStates N com.linbit.linstor.storage.layer.provider.AbsStorageProvider:166
prepare N com.linbit.linstor.storage.layer.provider.AbsStorageProvider:158
prepare N com.linbit.linstor.storage.layer.provider.StorageLayer:161
prepare N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:628
prepareLayers N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:259
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:127
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
run N java.lang.Thread:834
END OF ERROR REPORT.
ZFS is completely optional in LINSTOR, so if you use it, I suggest that you add a dependency on the zfs services for LINSTOR. That would be the satellite service (i.e., systemctl edit --system --full linstor-satellite.service
, depending on your actual preferences). This is something the admin has to define. We should probably document that better, patches to linbit-documentation
are welcome.
Even linstor-satellite.service gets ready and Proxmox starts, the devices are not populated. Is there a command which will return true if all devices are ready?
But the better approach is that proxmox would wait starting VMs if your interface to proxmox signal that the devices are not ready.
Did you add the service dependency so that LINSTOR depends on the zfs services? If not, then no surprises at all. obviously it gets "ready" because it is started in in parallel to the zfs stuff, but then queries zfs information before zfs is actually finished and ready. This has nothing to do with Proxmox and signaling whatsoever. You need to define the correct startup order between the zfs services and LINSTOR.
First I used After=zfs-import.target
.
Then After=zfs.target
.
But nothing worked.
The problem is that I have more than 1800 snapshots and zfs list
need more than 2 min at boot time.
Isn't it possible to wait in LINSTORPlugin.pm some time for the devices to get ready or return something like 503 service unavailable
and Proxmox has to retry it a few times?
I created a service which will warm the zfs cache and delayed linstore-satellite.service and pve-guests.service. But PVE start the VMs before linstor-satellite will the devices get ready which needs round about 30 more seconds.
blockdev: cannot open /dev/drbd/by-res/vm-152-disk-1/0: No such file or directory
Here is something needed like zfs-volume-wait.service
.
The final part was to start a non DRBD VM with delay=180
while the devices get ready.
But this is not smart.
# systemctl cat zfs-warm-cache.service
# /etc/systemd/system/zfs-warm-cache.service
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null
[Install]
WantedBy=zfs.target
Here is a svg file from systemd-analyze plot
.
zfs systemd-plot.svg.zip
@rck Have seen my last comment?
I think these are 2 problems:
-
zfs
(list
or whatever) not ready when linstor starts: this has to be solved via inter-service dependencies. I'd assume linstor tries again and again anyways, right @ghernadi ? So it would come up, but too late. - the plugin reporting "ready" to proxmox while the devices are not started/drbd up/usable yet: That is something where the plugin should improve. Quite frankly I will have to look up the API/the calls we as plugin get. And if we can wait there in a loop until the actual block device is ready. I'm out of office again, so I can look at this earliest next week. But this part is something the plugin has to improve, I agree.
So it would come up, but too late.
Yes it comes up for sure.
Perhaps here is some help needed from Proxmox (@Fabian-Gruenbichler) to extend there API?
For Proxmox I prefer the API solution.
But for general what's about a little health check tool?
- It can check/wait for controller connections
- Wait for populating devices
- Wait for resyncing
If you read the defaults (timeouts, pools, resources) from the environment it's possible to create a service with a default file from /etc/default/linstor-wait.
[Service]
EnvironmentFile=/etc/default/linstor-wait
ExecStart=linstor-wait
# cat /etc/default/linstor-wait
linstor_controller_timeout = 60s
linstor_device_timeout = 5m
linstor_resync_timeout = 10m
I'd assume linstor tries again and again anyways, right @ghernadi ?
I was not sure about this as Linstor's fullSync has some special cases, so I did a quick test with the unfortunate result that it might depend on how zfs fails.
When I tested with
-
mv /sbin/zfs /sbin/zfs_broken
- let the startup / fullSync fail
- revert the
mv
and - wait for a retry
the controller did not retry. That was due to a bug that this failure was not forwarded to the controller as failure but as a simple message. This bug is fixed now, and will be included in the next release (so thanks for making me recheck it :) ).
Hi @ggzengel ,
today I looked into it a bit closer, and we came up with the following: We don't want to fix this in a Proxmox (or plugin specific way), boot up should be handled by the service file. The new semantic will be that linstor-satellite.service
will only flag "ready" if all block devices on that node are usable. Given that, the rest can then be a simple/usual systemd dependency.
This will take some time, reopening this issue.
In a second step information about readiness of given resources on given nodes can also be exposed via the REST API. This then basically also replaces your standalone tool. The API can then be used in all plugins (e.g., to check if a freshly created resource is actually ready to use (it is a bit more complicated than to just stat
the device node)).
chiming in a bit late, since this got lost in my generic github notifications queue. PVE uses pve-storage.target as a boot up synchronization point, so if you hook the linstor services into that and they only complete their startup once everything is accessible, this should be enough to order onboot guest and PVE API startup properly.
Sorry for bumping this old thread, but I'm just wondering if there's any progress on this? Or are there any possible workarounds? This is quite a serious issue, since it basically means that not a single VM using DRBD will be started after a reboot.
For now, I wrote a script that checks if Linstor controller is available and a service file that calls this script. Then I specified the service in pve-storage.target
, like @Fabian-Gruenbichler suggested. But it's not a great solution, what if some resources become available earlier than others? But here's the script and service file anyway, in case it will help anybody:
#!/bin/bash
tries=100
interval=3
is_ready() {
linstor r l
}
for (( i=0; i < $tries; ++i )); do
echo "Trying"
is_ready && exit 0
sleep $interval
done
exit 1
[Unit]
Description=Periodically check if Linstor is ready
[Service]
Type=oneshot
ExecStart=/bin/bash /usr/bin/linstor-is-ready.sh
[Install]
WantedBy=multi-user.target
And systemctl edit pve-storage.target
:
[Unit]
After=linstor-satellite.service
After=linstor-is-ready.service
Any suggestions on how this could be done better are very welcome!
Meanwhile I use:
# systemctl cat zfs-warm-cache.service
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null
[Install]
WantedBy=zfs.target
Meanwhile I use:
# systemctl cat zfs-warm-cache.service [Unit] Description=ZFS warm cache DefaultDependencies=no After=zfs-import.target Before=linstor-satellite.service Before=pve-guests.service [Service] Type=oneshot RemainAfterExit=yes ExecStart=/sbin/zfs list -t all StandardOutput=null [Install] WantedBy=zfs.target
If you will set it as script:
SYSTEMD_EDITOR=tee systemctl edit --full --force zfs-warm-cache.service <<EOF
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null
[Install]
WantedBy=zfs.target
EOF
systemctl enable zfs-warm-cache.service