vm
vm copied to clipboard
Make LVM snapshot default when no issues get reported
This is just a reminder that we don't forget to make the LVM snapshot default when no issues get reported. https://github.com/nextcloud/vm/blob/39e64fe07920bea14f064abaf71847cd5c7165a3/nextcloud_install_production.sh#L68 After we do this, everyone wil be able to use the built-in backup solution.
I'd say when development stopped on that part for some time because it's rock solid, then maybe. :)
I'd say it is already pretty stable but yeah
We could do this for Ubuntu 22.04 making the OS disk 45 GB in size, or extend the drive so there's only 5 GB left and keep it 40 GB in total size.
Would that work?
cc @small1
We could do this for Ubuntu 22.04 making the OS disk 45 GB in size, or extend the drive so there's only 5 GB left and keep it 40 GB in total size.
Would that work?
From my side, yes 👍
Just tested on one of my prod instances. I don't think this is stable enough:
Last login: Thu Oct 7 12:49:11 2021 from blablabla
root@cloud:~# bash /var/scripts/update.sh
Posting notification to users that are admins, this might take a while...
Posting 'Update script started!' to: enoch85
Warning: Stopping docker.service, but it can still be activated by:
docker.socket
Maintenance mode enabled
Logical volume ubuntu-vg/NcVM-snapshot is used by another device.
Maintenance mode disabled
Starting docker...
Posting notification to users that are admins, this might take a while...
Posting 'Update failed!' to: enoch85
Logical volume ubuntu-vg/NcVM-snapshot is used by another device.
Honestly, I've never seen this issue. What did you do before this issue appeard? Did you reinstall ubuntu from scratch and choosed to add the partition in the install script?
Did you reinstall ubuntu from scratch and choosed to add the partition in the install script?
This way I'am running this setup since half a year or longer without any issue...
Or in other words: what are the steps to reproduce this issue?
Did you reinstall ubuntu from scratch and choosed to add the partition in the install script?
Yes, since the company was sold, we moved the whole thing to a new server with a new install and export import of DB and stuff. So it's by the book installed "your way".
Or in other words: what are the steps to reproduce this issue?
I don't know. I just ran an update yesterday and it happened. No automatic updates either.
Thanks! So then I will try to investigate how this could happen :)
Does it happen every time you run the update script?
Could be a bug with lvm... https://blog.roberthallam.org/2017/12/solved-logical-volume-is-used-by-another-device/
Could you please try the following commands and post the output of those here (if it should still happen)?
lvremove -v /dev/ubuntu-vg/NcVM-snapshot
dmsetup info -c | grep NcVM | grep snapshot
# more to come when we have more info based on the guide linked above
@enoch85 do you have some feedback here? It is hard to debug without a way to reproduce this issue...
As it's not in the released version yet, please add a PR with the fix you proposed, and I'll run one of the auto update VMs with the new setup.
I can try. But after reading through the code, did you try to reboot the affected server once after you got the notification that the update failed because of the failed lvremove?
I've only seen this once, and I'm not sure if the server was rebooted or not.
If you think it can be improved, then do so, else leave it for now.
Thanks for the feedback! Honestly, since I still think that this is a bug in LVM itself, I don't think I can improve the logic/code. I could try to work around the symptoms but not solve the issue itself. So a reboot is probably still the best option in this case. Since you only saw this once, I think its fine, though. Do you agree?
I'm still not convinced it should be the default way of the VM. One more thing that could break - we want to keep those events limited.
It happened again.
- Run menu.sh --> minor
- It finished as expected
- Run menu.sh --> update again
Some debug output:
Posting 'Update script started!' to: enoch85
++ hostname -f
+ nextcloud_occ_no_check notification:generate -l 'The update script in the Nextcloud VM has been executed.
You will be notified when the update is done.
Please don'\''t shutdown or restart your server until then.' enoch85 'cloud.hanssonit.se: Update script started!'
+ sudo -u www-data php /var/www/nextcloud/occ notification:generate -l 'The update script in the Nextcloud VM has been executed.
You will be notified when the update is done.
Please don'\''t shutdown or restart your server until then.' enoch85 'cloud.hanssonit.se: Update script started!'
+ check_free_space
+ vgs
++ vgs
++ grep ubuntu-vg
++ awk '{print $7}'
++ grep -oP '[0-9]+\.[0-9]'
++ sed 's|\.||'
++ grep g
+ FREE_SPACE=
+ '[' -z '' ']'
+ FREE_SPACE=0
+ '[' -f /var/scripts/nextcloud-startup-script.sh ']'
+ does_snapshot_exist NcVM-startup
+ local SNAPSHOTS
+ local snapshot
+ lvs
++ lvs
++ grep ubuntu-vg
++ awk '{print $1}'
++ grep -v ubuntu-lv
+ SNAPSHOTS=NcVM-snapshot
+ '[' -z NcVM-snapshot ']'
+ mapfile -t SNAPSHOTS
+ for snapshot in "${SNAPSHOTS[@]}"
+ '[' NcVM-snapshot = NcVM-startup ']'
+ return 1
+ does_snapshot_exist NcVM-snapshot
+ local SNAPSHOTS
+ local snapshot
+ lvs
++ lvs
++ grep ubuntu-vg
++ awk '{print $1}'
++ grep -v ubuntu-lv
+ SNAPSHOTS=NcVM-snapshot
+ '[' -z NcVM-snapshot ']'
+ mapfile -t SNAPSHOTS
+ for snapshot in "${SNAPSHOTS[@]}"
+ '[' NcVM-snapshot = NcVM-snapshot ']'
+ return 0
+ '[' -f /var/scripts/daily-borg-backup.sh ']'
+ crontab -u root -l
+ grep -v 'lvrename /dev/ubuntu-vg/NcVM-snapshot-pending'
+ crontab -u root -
+ crontab -u root -l
+ cat
+ crontab -u root -
+ echo '@reboot /usr/sbin/lvrename /dev/ubuntu-vg/NcVM-snapshot-pending /dev/ubuntu-vg/NcVM-snapshot &>/dev/null'
+ SNAPSHOT_EXISTS=1
+ is_docker_running
+ docker ps -a
+ check_command systemctl stop docker
+ systemctl stop docker
Warning: Stopping docker.service, but it can still be activated by:
docker.socket
+ nextcloud_occ maintenance:mode --on
+ check_command sudo -u www-data php /var/www/nextcloud/occ maintenance:mode --on
+ sudo -u www-data php /var/www/nextcloud/occ maintenance:mode --on
Maintenance mode enabled
+ does_snapshot_exist NcVM-startup
+ local SNAPSHOTS
+ local snapshot
+ lvs
++ lvs
++ grep ubuntu-vg
++ awk '{print $1}'
++ grep -v ubuntu-lv
+ SNAPSHOTS=NcVM-snapshot
+ '[' -z NcVM-snapshot ']'
+ mapfile -t SNAPSHOTS
+ for snapshot in "${SNAPSHOTS[@]}"
+ '[' NcVM-snapshot = NcVM-startup ']'
+ return 1
+ does_snapshot_exist NcVM-snapshot
+ local SNAPSHOTS
+ local snapshot
+ lvs
++ lvs
++ grep ubuntu-vg
++ awk '{print $1}'
++ grep -v ubuntu-lv
+ SNAPSHOTS=NcVM-snapshot
+ '[' -z NcVM-snapshot ']'
+ mapfile -t SNAPSHOTS
+ for snapshot in "${SNAPSHOTS[@]}"
+ '[' NcVM-snapshot = NcVM-snapshot ']'
+ return 0
+ lvremove /dev/ubuntu-vg/NcVM-snapshot -y
Logical volume ubuntu-vg/NcVM-snapshot is used by another device.
+ nextcloud_occ maintenance:mode --off
+ check_command sudo -u www-data php /var/www/nextcloud/occ maintenance:mode --off
+ sudo -u www-data php /var/www/nextcloud/occ maintenance:mode --off
Maintenance mode disabled
+ start_if_stopped docker
+ pgrep docker
+ print_text_in_color '\e[0;96m' 'Starting docker...'
+ printf '%b%s%b\n' '\e[0;96m' 'Starting docker...' '\e[0m'
Starting docker...
+ systemctl start docker.service
++ date +%T
+ notify_admin_gui 'Update failed!' 'Could not remove NcVM-snapshot - Please reboot your server! 13:29:33'
+ local NC_USERS
+ local user
+ local admin
+ is_app_enabled notifications
+ sed '/Disabled/,$d'
+ awk '{print$2}'
+ nextcloud_occ app:list
+ check_command sudo -u www-data php /var/www/nextcloud/occ app:list
+ sudo -u www-data php /var/www/nextcloud/occ app:list
+ sed '/^$/d'
+ grep -q '^notifications$'
+ tr -d :
+ return 0
+ print_text_in_color '\e[0;96m' 'Posting notification to users that are admins, this might take a while...'
+ printf '%b%s%b\n' '\e[0;96m' 'Posting notification to users that are admins, this might take a while...' '\e[0m'
Posting notification to users that are admins, this might take a while...
+ send_mail 'Update failed!' 'Could not remove NcVM-snapshot - Please reboot your server! 13:29:33'
+ local RECIPIENT
+ '[' -f /etc/msmtprc ']'
+ return 1
+ '[' -z enoch85 ']'
+ for admin in "${NC_ADMIN_USER[@]}"
+ print_text_in_color '\e[0;92m' 'Posting '\''Update failed!'\'' to: enoch85'
+ printf '%b%s%b\n' '\e[0;92m' 'Posting '\''Update failed!'\'' to: enoch85' '\e[0m'
Posting 'Update failed!' to: enoch85
++ hostname -f
+ nextcloud_occ_no_check notification:generate -l 'Could not remove NcVM-snapshot - Please reboot your server! 13:29:33' enoch85 'cloud.hanssonit.se: Update failed!'
+ sudo -u www-data php /var/www/nextcloud/occ notification:generate -l 'Could not remove NcVM-snapshot - Please reboot your server! 13:29:33' enoch85 'cloud.hanssonit.se: Update failed!'
+ msg_box 'It seems like the old snapshot could not get removed.
This should work again after a reboot of your server.'
+ '[' -n '' ']'
+ whiptail --title 'Nextcloud VM - 2022 - Nextcloud Update Script' --msgbox 'It seems like the old snapshot could not get removed.
This should work again after a reboot of your server.' '' ''
+ exit 1
Thanks for the verbose output! Please try the following and report back:
lvremove -v /dev/ubuntu-vg/NcVM-snapshot
dmsetup info -c | grep NcVM | grep snapshot
# more to come when we have more info based on the guide linked above
Already rebooted ;/
Already rebooted ;/
hm :/
OK, managed to reproduce it and here's the output:
root@cloud:~# lvremove -v /dev/ubuntu-vg/NcVM-snapshot
Logical volume ubuntu-vg/NcVM-snapshot in use.
root@cloud:~# dmsetup info -c | grep NcVM | grep snapshot
ubuntu--vg-NcVM--snapshot 253 3 L--w 1 1 2 LVM-k9Rc3WOCi8FftbHl00Er0pzO7k7Kpttkwe5oq1zHuHZW7Ia6auXkP4fS59G1HaSX
ubuntu--vg-NcVM--snapshot-cow 253 2 L--w 1 1 2 LVM-k9Rc3WOCi8FftbHl00Er0pzO7k7Kpttkwe5oq1zHuHZW7Ia6auXkP4fS59G1HaSX-cow
Great! As a follow up: whats the output of
ls -la /sys/dev/block/253\:3/holders
ls -la /sys/dev/block/253\:2/holders
When I have the output it should only take one command to remove the blocking device and afterwards the lvremove should finally work :) This would then be a better way to solve this instead of rebooting that we can automate in case lvremove fails :)
root@cloud:~# ls -la /sys/dev/block/253\:3/holders
total 0
drwxr-xr-x 2 root root 0 jan 29 13:33 .
drwxr-xr-x 9 root root 0 jan 29 13:33 ..
root@cloud:~# ls -la /sys/dev/block/253\:2/holders
total 0
drwxr-xr-x 2 root root 0 jan 29 13:33 .
drwxr-xr-x 9 root root 0 jan 29 13:33 ..
lrwxrwxrwx 1 root root 0 jan 29 14:29 dm-3 -> ../../dm-3
Thanks! after runing the following command, the removal should work. please report back!
dmsetup remove /dev/dm-3
lvremove -v /dev/ubuntu-vg/NcVM-snapshot
If that works, I will try to come up with a PR that fixes this once and for all :)