dracut icon indicating copy to clipboard operation
dracut copied to clipboard

Overlayfs instead of device-mapper

Open probonopd opened this issue 9 years ago • 41 comments

I have the exact same issue as described on https://lists.fedoraproject.org/pipermail/users/2015-July/463627.html and https://lists.fedoraproject.org/pipermail/devel/2015-July/212956.html

The fact is with overlayfs (or aufs from 2014 Ubuntu release) Live Linux you can apt-get install whatever (write to /) as much data as possible; till it becomes full and returns -ENOSPACE (and continues usable if clean some data), on this Fedora Live OS with device-mapper and simple sudo dnf install a few packages crashes kernel. Which is more mature?

The live system crashes when installing large packages despite having 8GB or RAM in the system. Other live systems such as casper do not have this issue. As far as I can see, there is no solution for this yet.

The difference for me as a user is that I can use casper-based live systems for my actual daily work (do not crash) while I cannot do the same with dracut-based Fedora live images (always crash).

Please make it possible to overlay-mount a large(!) amount of RAM so that the live system can handle large file system operations without crashing.

Thanks for your consideration.

probonopd avatar Nov 18 '15 19:11 probonopd

Here is something that is beginning to work for me: https://github.com/probonopd/SystemImageKit/blob/master/boot/iso/additional-initramfs/dracut-overlayfs-support/usr/lib/dracut/hooks/pre-pivot/01-liveoverlayfs.sh

# Create a 1 GB file (without my change, this would crash the system)
[me@host ~]$ dd if=/dev/zero of=foo bs=1M count=1024

[me@host ~]$ ls -lh foo
-rw-rw-r-- 1 me me 1,0G 22. Nov 23:02 foo

# The space is taken in the tmpfs mounted to /run
[me@host ~]$ df -h
tmpfs           4,9G    1,1G  3,8G   22% /run

# Delete the 1 GB file
[me@host ~]$ rm -rf foo 

# The space is actually reclaimed (without my change, it would not be reclaimed)
[me@host ~]$ df -h
tmpfs           4,9G     99M  4,8G    2% /run

Now of course it would be nice if something like this would officially be integrated or even made the default (not in the hackish way I used) after some thorough testing.

In my experience, the resulting live system is way more stable and usable than without my change.

probonopd avatar Nov 22 '15 13:11 probonopd

thanks @probonopd it looks promising; please continue push till it can become a default in next fedora live and I would like a test

crquan avatar Nov 23 '15 18:11 crquan

@FGrose : what do you think?

haraldh avatar Nov 24 '15 07:11 haraldh

I have been running this for quite some time now and I can say that the Fedora 23 live system now is much more stable than before. It is my main OS right now, which simply wasn't possible before.

probonopd avatar Nov 27 '15 12:11 probonopd

Well, there is much misunderstanding of Device-mapper snapshot behavior in LiveOS images, but the OverlayFS does have the potential to avoid some disadvantages of currently configured LiveOS images. If Fedora LiveOS images were built with large root filesystems and defaulted to use large transient overlays, most of the complaints about overflowing overlays invalidating the system might be eliminated. (The new Persistent with Overflow dm snapshot store type, in kernel 4.3+, will also make persistent LiveUSB systems more robust.) Since transient dm overlay files are sparse files on an ext4 tmpfs, and since the LiveOS root filesystem is sparsed before compression, carrying the extra reserved partition table space on the root filesystem would be a good compromise in exchange for the magnitude of frustration that users have when their large RAM stores cannot be used.

(By the way, the storage space of files added to a dm snapshot can be reused by deleting the file. Only files deleted from the base, read-only, root filesystem cannot have their space reused without merging the overlay.)

In any case, I have prepared a test commit that enables the use of the OverlayFS for Fedora LiveOS root filesystems, see pull request #107. ~~The OverlayFS does not handle SELinux well, so NOTE WELL that a kernel command line option of enforcing=0 is currently required for these systems (I don't know how to set that early enough from the dmsquash-live module. Perhaps, that is possible.)~~ (SELinux is supported in OverlayFS since kernel 4.8.)

It would be good to see some comparative performance testing for the two types of overlays. Before submitting a pull request, more testing with other boot configurations, such as persistent overlays on vfat, ext4, btrfs, and other filesystems is needed. I'm hoping others can jump in here.

FGrose avatar Nov 28 '15 07:11 FGrose

Thank you very much, this is awesome!

So far I can say that my Fedora 23 live ISO runs very well from a fat32 USB drive booted with GRUB2 loop-mount using overlayfs and selinux=0 (which I always use because in my use case selinux makes more trouble than it helps, and is sponsored by the NSA, but that is another topic). I could not notice any performance degradation because as mentioned before, this is the first time that the system is actually usable for any real use. In my opinion, this alone should justify inclusion (although not as the default initially). I'd love to help testing, but I guess until this lands in Rawhide I'm a bit clueless on how to actually do it.

probonopd avatar Nov 28 '15 12:11 probonopd

I've force pushed some updates to the OverlayFS commit, see pull request #107.

The patch now supports and has been tested with persistent overlays on vfat and ext4-formatted devices.

I'm still looking for a way to allow the initramfs to sense and use a persistent overlay of either OverlayFS or DM type on the boot device with and without dracut-systemd. ~~I think this would require scheduling dmsquash-generator after dmsquash-live-root has completed or exported the condition variable.~~ (28-Jan-2017: A new version senses a persistent overlay & type in the standard LiveOS directory and uses it. If dracut-systemd is present, the dmsquash-generator is edited inline and then systemctl daemon-reload updates the sysroot.mount unit before mounting.)

FGrose avatar Dec 03 '15 07:12 FGrose

Does it possible to use live squashfs image on buttom and tmpfs in upper layer via overlayfs ? I'm use now my handmade patch for dracut

vtolstov avatar Dec 03 '15 07:12 vtolstov

Yes, by requesting rd.live.image rd.live.overlay.overlayfs on the kernel command line in addition to the standard arguments.

FGrose avatar Dec 03 '15 08:12 FGrose

@FGrose i don't see mounting tmpfs in you patch, how this can works?

vtolstov avatar Dec 03 '15 08:12 vtolstov

Notice that the transient overlay directories are placed in /run/overlayfs and /run/ovlwork (at lines 150 & 151). /run is a tmpfs. See also line 67 in dmsquash-generator.

FGrose avatar Dec 03 '15 08:12 FGrose

@FGrose thanks! i miss that =)!

vtolstov avatar Dec 03 '15 08:12 vtolstov

See this trial version of livecd-iso-to-disk https://github.com/FGrose/livecd-tools/blob/litd/tools/livecd-iso-to-disk, which allows one to optionally configure a Live USB with an OverlayFS.

To use the new script on an existing LiveCD/.iso, use the --avoidsourcescript --overlayfs options to avoid using the source's livecd-iso-to-disk script and to select the option to create OverlayFS overlays on the installation device.

FGrose avatar Dec 03 '15 10:12 FGrose

2015-12-03 13:17 GMT+03:00 Frederick Grose [email protected]:

See this trial version of livecd-iso-to-disk FGrose/livecd-tools@1dd9cb4 https://github.com/FGrose/livecd-tools/commit/1dd9cb4205c6db7fa1a45ce52e05515b318a2753, which allows one to optionally configure a Live USB with an OverlayFS.

THanks!

Vasiliy Tolstov, e-mail: [email protected]

vtolstov avatar Dec 03 '15 10:12 vtolstov

I've force pushed an updated to the OverlayFS commit, see pull request #107.

This version implements rd.live.overlay.readonly for OverlayFS using the multiple lower layer mount option for OverlayFS. This allows a device with a persistent overlay to be mounted with a transient overlay stacked over the persistent overlay.

This version now has all the features for OverlayFS as for Device-mapper overlays.

FGrose avatar Dec 06 '15 00:12 FGrose

Nice!

vtolstov avatar Dec 07 '15 09:12 vtolstov

Pull request submitted: https://github.com/dracutdevs/dracut/pull/107

Mailing list post: http://article.gmane.org/gmane.linux.kernel.initramfs/4308

FGrose avatar Dec 20 '15 04:12 FGrose

Excellent, thank you!

probonopd avatar Dec 20 '15 08:12 probonopd

Hello, @FGrose : I was wondering if overlay usage in dracut could be extend to also support read-only /. On systems with low resources, using a NFS / instead of squashFS is really valuable, just need a read-write space in addition then. Have to hack NFS support for that or a more generic implementation of overlayFS is possible within dracut ? Thanks - Pierre

yopito avatar Dec 22 '15 03:12 yopito

@yopito : I suspect that what you want could be achieved with appropriate kernel command line options. See https://github.com/haraldh/dracut/blob/master/dracut.cmdline.7.asc . rd.live.overlay.overlayfs rd.live.overlay.readonly will layer a read-only root filesystem below an OverlayFS overlay in RAM.

FGrose avatar Dec 22 '15 23:12 FGrose

Here is a new version of livecd-tools/liveimage-mount that supports LiveOS devices with OverlayFS.

And here is a new version of livecd-tools/editliveos.py that supports editing LiveOS images that use OverlayFS.

FGrose avatar Dec 28 '15 04:12 FGrose

@FGrose : thanks for your reply. It does not work with / mounted from NFS share (I've tested it), this is not the same "space" than live stuff. I've hijacked a little bit this thread I presume ? Should open a feature request against 95nfs for instance ? I mean PXE booting with options like:

LABEL VoidInstallNFS
  MENU LABEL VoidLinux 32bit Textmode Installer via NFS (2015-11)
  KERNEL kernels/void/pxe/boot/vmlinuz
  APPEND initrd=kernels/void/pxe/boot/initrd ip=dhcp root=nfs:192.168.50.1:/share/tftpboot/kernels/void/pxe/rootfs init=/sbin/init ro rd.luks=0 rd.md=0 rd.dm=0 loglevel=4 vconsole.unicode=1 vconsole.keymap=fr locale.LANG=en_US.UTF-8

yopito avatar Jan 02 '16 12:01 yopito

Can someone please confirm that rd.live.overlay.overlayfs rd.live.overlay.readonly is still working on Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso? I can't seem to get it to boot anymore.

probonopd avatar May 22 '16 09:05 probonopd

Doesn't work for me using Fedora-MATE_Compiz-Live-x86_64-Rawhide-20160628.n.0.iso; can't find the persistent overlay

usdanskys avatar Jul 01 '16 18:07 usdanskys

in FAI (Fully Automatic Installation) we use dracut an overlayfs with a ramdisk. I've created a new dracut module which puts a ramdisk on top of the nfsroot using a ramdisk. You can find it here: https://anonscm.debian.org/cgit/collab-maint/dracut.git/tree/debian/90overlay-root

Mrfai avatar Dec 14 '16 12:12 Mrfai

Further references to this issue:

  • https://bugzilla.redhat.com/show_bug.cgi?id=582490 (2010)
  • http://forums.fedoraforum.org/showthread.php?t=256403 (2010)

Isn't it about time to dump this unreliable device-mapper stuff entirely and just use overlayfs exclusively? (Yes, I know it had been discussed 10 years ago when unionfs was still bleeding edge). Can it be done now, using overlayfs?

probonopd avatar Dec 20 '16 03:12 probonopd

@probonopd My understanding is that OverlayFS should now work with SELinux, as a result of Dan Walsh's work for Docker stuff. Are there any other blockers to switching to OverlayFS for live images?

Conan-Kudo avatar Jan 04 '17 04:01 Conan-Kudo

With pull request https://github.com/dracutdevs/dracut/pull/107, Fedora 24 (updated) and Fedora 25 LiveOS images will boot and run with a root OverlayFS and SELinux enforcing. This constitutes a proof of concept.

There remain differences and some immaturity with OverlayFS as a root filesystem. For example, below are some errors that appear on running sudo dnf upgrade:

LiveUSB with OverlayFS with a transient overlay:
booted with kernel 4.9.3-200.fc25.x86_64 on a 32 GiB USB 3.0 flash drive:

Some errors in the transcript of sudo dnf upgrade:

  Upgrading   : pcre-8.40-1.fc25.x86_64                                                                                        17/974 
libsemanage.semanage_commit_sandbox: Error while renaming /var/lib/selinux/targeted/active to /var/lib/selinux/targeted/previous. (Invalid cross-device link).
semodule:  Failed!

  Upgrading   : selinux-policy-3.13.1-225.6.fc25.noarch                                                                       308/974 
libsemanage.semanage_commit_sandbox: Error while renaming /var/lib/selinux/targeted/active to /var/lib/selinux/targeted/previous. (Invalid cross-device link).
semodule:  Failed!

  Upgrading   : selinux-policy-targeted-3.13.1-225.6.fc25.noarch                                                              376/974 
libsemanage.semanage_commit_sandbox: Error while renaming /var/lib/selinux/targeted/active to /var/lib/selinux/targeted/previous. (Invalid cross-device link).
/usr/sbin/semodule:  Failed!

These are caused by a known difficiency (see https://patchwork.kernel.org/patch/9373113/).

There are others (a partial report):

  Upgrading   : NetworkManager-bluetooth-1:1.4.2-2.fc25.x86_64                                                                377/974 
  Upgrading   : NetworkManager-adsl-1:1.4.2-2.fc25.x86_64                                                                     378/974 
/var/tmp/rpm-tmp.kYo3cn: line 1:  4020 Segmentation fault      /usr/sbin/groupadd -r nm-openconnect &> /dev/null
/var/tmp/rpm-tmp.kYo3cn: line 4:  4021 Segmentation fault      /usr/sbin/useradd -r -s /sbin/nologin -d / -M -c 'NetworkManager user for OpenConnect' -g nm-openconnect nm-openconnect &> /dev/null
  Upgrading   : NetworkManager-openconnect-1.2.4-3.fc25.x86_64                                                                379/974 

  Upgrading   : libnl3-cli-3.2.29-1.fc25.x86_64                                                                               463/974 
warning: %post(libnl3-cli-3.2.29-1.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package libnl3-cli
Non-fatal POSTIN scriptlet failure in rpm package libnl3-cli
  Upgrading   : perl-Time-Local-1:1.250-1.fc25.noarch                                                                         464/974 
  Upgrading   : lua-posix-33.3.1-3.fc25.x86_64                                                                                465/974 
  Upgrading   : adwaita-qt4-0.97-1.fc25.x86_64                                                                                466/974 
  Upgrading   : libphodav-2.1-1.fc25.x86_64                                                                                   467/974 
warning: %post(libphodav-2.1-1.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package libphodav
Non-fatal POSTIN scriptlet failure in rpm package libphodav
  Upgrading   : augeas-libs-1.7.0-1.fc25.x86_64                                                                               468/974 
warning: %post(augeas-libs-1.7.0-1.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package augeas-libs
Non-fatal POSTIN scriptlet failure in rpm package augeas-libs
  Upgrading   : libev-4.24-1.fc25.x86_64                                                                                      469/974 
warning: %post(libev-4.24-1.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package libev
Non-fatal POSTIN scriptlet failure in rpm package libev
  Upgrading   : libsss_autofs-1.14.2-2.fc25.x86_64                                                                            470/974 
  Upgrading   : libsss_sudo-1.14.2-2.fc25.x86_64                                                                              471/974 
warning: %post(libsss_sudo-1.14.2-2.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package libsss_sudo
Non-fatal POSTIN scriptlet failure in rpm package libsss_sudo
  Upgrading   : microcode_ctl-2:2.1-13.1.fc25.x86_64                                                                          472/974 
  Upgrading   : npth-1.3-1.fc25.x86_64                                                                                        473/974 
warning: %post(npth-1.3-1.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package npth
Non-fatal POSTIN scriptlet failure in rpm package npth
  Upgrading   : pigz-2.3.4-1.fc25.x86_64                                                                                      474/974 
  Upgrading   : slang-2.3.0-7.fc25.x86_64                                                                                     475/974 
warning: %post(slang-2.3.0-7.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package slang
Non-fatal POSTIN scriptlet failure in rpm package slang
  Upgrading   : xfsprogs-4.9.0-1.fc25.x86_64                                                                                  476/974 
warning: %post(xfsprogs-4.9.0-1.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTIN scriptlet failure in rpm package xfsprogs
Non-fatal POSTIN scriptlet failure in rpm package xfsprogs


  Cleanup     : pcre-8.39-6.fc25.x86_64                                                                                       968/974 
warning: %postun(pcre-8.39-6.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTUN scriptlet failure in rpm package pcre
Non-fatal POSTUN scriptlet failure in rpm package pcre
  Cleanup     : libselinux-2.5-12.fc25.x86_64                                                                                 969/974 
warning: %postun(libselinux-2.5-12.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTUN scriptlet failure in rpm package libselinux
Non-fatal POSTUN scriptlet failure in rpm package libselinux
  Cleanup     : glibc-common-2.24-3.fc25.x86_64                                                                               970/974 
  Cleanup     : glibc-all-langpacks-2.24-3.fc25.x86_64                                                                        971/974 
  Cleanup     : glibc-2.24-3.fc25.x86_64                                                                                      972/974 
warning: %postun(glibc-2.24-3.fc25.x86_64) scriptlet failed, exit status 127
Non-fatal POSTUN scriptlet failure in rpm package glibc
Non-fatal POSTUN scriptlet failure in rpm package glibc

And when a new kernel needed a new initramfs, this error occurred:

dracut-install: No SOURCE argument given
Usage: dracut-install -D DESTROOTDIR [OPTION]... -a SOURCE...
or: dracut-install -D DESTROOTDIR [OPTION]... SOURCE DEST
or: dracut-install -D DESTROOTDIR [OPTION]... -m KERNELMODULE [KERNELMODULE …]

Install SOURCE to DEST in DESTROOTDIR with all needed dependencies.

  KERNELMODULE can have the format:
     <absolute path> with a leading /
     =<kernel subdir>[/<kernel subdir>…] like '=drivers/hid'
     <module name>

  -D --destrootdir  Install all files to DESTROOTDIR as the root
  -a --all          Install all SOURCE arguments to DESTROOTDIR
  -o --optional     If SOURCE does not exist, do not fail
  -d --dir          SOURCE is a directory
  -l --ldd          Also install shebang executables and libraries
  -L --logdir <DIR> Log files, which were installed from the host to <DIR>
  -R --resolvelazy  Only install shebang executables and libraries
                     for all SOURCE files
  -H --hostonly     Mark all SOURCE files as hostonly

  -f --fips         Also install all '.SOURCE.hmac' files

  --module,-m       Install kernel modules, instead of files
  --kerneldir       Specify the kernel module directory
  --firmwaredirs    Specify the firmware directory search path with : separation
  --silent          Don't display error messages for kernel module install
  -o --optional     If kernel module does not exist, do not fail
  -p --mod-filter-path      Filter kernel modules by path regexp
  -P --mod-filter-nopath    Exclude kernel modules by path regexp
  -s --mod-filter-symbol    Filter kernel modules by symbol regexp
  -S --mod-filter-nosymbol  Exclude kernel modules by symbol regexp
  -N --mod-filter-noname    Exclude kernel modules by name regexp

  -v --verbose      Show more output
     --debug        Show debug output
     --version      Show package version
  -h --help         Show this help

dracut: FAILED:  /usr/lib/dracut/dracut-install -D /var/tmp/dracut.M40ZkZ/initramfs --kerneldir /lib/modules/4.9.3-200.fc25.x86_64/ -o -m

FGrose avatar Jan 22 '17 19:01 FGrose

Would disabling selinux solve these? (It's the first thing I do on Fedora-type systems anyways.)

probonopd avatar Jan 22 '17 19:01 probonopd

With setenforce Permissive, only the first set of errors above show up in the transcript (from known OverlayFS issue).

Journals for the boots and sudo dnf upgrade for both Fedora 24 & 25 Workstation Live are here: https://gist.github.com/FGrose/073bfbab9bb89753e27d57ef11633ee4 --- F24Live https://gist.github.com/FGrose/94bd2aad1031869ddac3d64edd0aead8 --- F25Live

FGrose avatar Jan 23 '17 12:01 FGrose