zfs Install flatpak packets in OpenZFS it's too slow. OSTree over OpenZFS it's too slow.

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	20.04
Linux Kernel	Linux laika 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Architecture	x86_64
ZFS Version	0.8.3-1ubuntu12.4
SPL Version	0.8.3-1ubuntu12.4

Describe the problem you're observing

Installing a flatpak application takes too much time surely due to the way that OSTree replicates the files using hard links.

You can observe a lot of iowait time.

Describe how to reproduce the problem

sudo -s apt update apt install flatpak time flatpak install https://dl.flathub.org/repo/appstream/ch.openboard.OpenBoard.flatpakref

Takes 30 seconds on ext4 and 6 minutes on OpenZFS.

Include any warning/errors/backtraces from the system logs

No errors just poor performance.

Oct 30 '20 18:10 vcarceler

I can confirm this issue on Gentoo, Kernel 5.4.72, OpenZFS 0.8.5, default module parameters, two-way mirror with WD80EFZX (CMR). However, the flatpak install command doesn't finish for me even after minutes. strace shows it seems to get trapped in repeating poll([{fd=23, events=POLLIN}], 1, 300) = 0 (Timeout). I can also confirm results for ext4, and for comparison, it also works OK on an ext4 formatted loop device (file) on top of ZFS. For what it's worth: the disk is making crazy noise until the process is interrrupted. I observed a similar behavior with Docker although tasks eventually finished in this case. At the same time the rest of the system is behaving normal, especially with a populated ARC. @behlendorf , maybe worth taking another look if there's more than just an opportunity to improve performance for an edge case? Also, I'd like to understand what else I could check on my end, thanks!

Nov 08 '20 10:11 eternalinflation

Also, I'd like to understand what else I could check on my end, thanks!

It sounds like the first order of business would be to determine exactly what operation on OpenZFS is significantly slower than ext4. From the original post it sounds like the suspicion is that it's hard links. Can you re-reun the flatpak command under strace -c for OpenZFS and ext4 so we can get a histogram of all the system call timings for comparison.

Nov 09 '20 23:11 behlendorf

The flatpak command (install gimp from flathub-beta in this case) actually ran to completion after 65 minutes (1m24s on ext4 loop device). Stats and timing: gh11140.txt. Since sys calls don't even remotely add up to real time (14s sys calls vs 65min total time for OpenZFS and 5s vs 85s for ext4) is the conclusion that time is spent in user space and that the flaw is solely in the applications?

Nov 10 '20 08:11 eternalinflation

The strace output does show we're not spending an inordinate amount of time in OpenZFS related system calls. That's good, but I don't think it entirely lets ZFS off the hook. The application appears to be polling waiting for something to happen, figuring out what that something is would be the next step. Is there any debugging you can enable in the flatpak command which would give some indication of what it's waiting on.

Nov 10 '20 22:11 behlendorf

I'm not sure if this is relevant but we have observed a great performance disparity installing flatpaks on SSDs with OpenZFS. Installing Gimp lasts:

On SanDisk SDSSDP12 120GB

real	14m39,336s
user	0m48,155s
sys	0m48,245s

On KINGSTON SA400S3 480GB

real	1m34,298s
user	0m21,515s
sys	0m29,661s

During installation top shows a very high (80%) iowait on every core.

Nov 12 '20 11:11 vcarceler

If you want to figure out what is slow you need to take into account that half of a system-wide flatpak installation happens in flatpak-system-helper. The user process only does the download, the rest is the import into the system dir.

To make it easier to debug I recommend using --user to install to the users homedir instead, as that is easier to trace.

Nov 12 '20 12:11 alexlarsson

@alexlarsson, thanks for this, I've always been running flatpak install with --user. I figured that the culprit (not necessarily the root cause) in this case could be ostree rather than flatpak. I'm not a software engineer, so guidance with setting up a gdb-testbed or better commandline tools would certainly be useful. Generally I'm not so sure this is a flatpak issue as I've observed a similar behavior - crazy disk noise and excessive iowait as if multiple processes were trying to write to the same disk all at the same time - with other applications, too, usually to a lesser degree. flatpak might be a good case to debug, though, as it seems to be hit harder than others. I'm a little surprised to see the variance even amongst NAND storage as per @vcarceler's experience, I'd have expected that this issue is a side effect of this particular workload with OpenZFS on mechanical storage.

Nov 12 '20 17:11 eternalinflation

Generally a flatpak install happens approximately like this:

Stage 1 "pull"

mkdir stage-dir
download required files to stage-dir/objects/*
syncfs(stage-dir)
rename stage-dir/object/* to repo/objects/*
fsync(repo/objects)

Stage 2 "deploy"

mkdir deploy-tmpdir
foreach $file in the app
- mkdir -p deploy-tmpdir/$(dirname $file)
- hardlink from repo/objects/$object to deploy-tmpdir/$file
syncfs(deploy-tmpdir)
rename(deploy-tmpdir, deploy-dir)

At this point we can run the app from deploy-dir.

This is fairly correct in the --user case. However, if the installation is system-wide, then things are split in the middle, where we pull to a local sub-repo similar to stage 1, but then we call out to a system helper that imports and verifies the sub-repo into the real system repo, and then run the stage 2 from that. I imagine the poll timeout you see is the main flatpak waiting for the system-helper to run stage 2.

In terms of fs ops flatpak is fairly regular, although it does rely a fair bit on hardlinks, so if those are slow that would be a problem.

Nov 12 '20 18:11 alexlarsson

Here're some results obtained from perf record ostree --repo=repo commit --branch=foo portage/ as per https://github.com/ostreedev/ostree/issues/2227#issuecomment-726895364 (thanks @dbnicholson). This indeed triggers the issue. Here I interrupted the command after a couple minutes: ostree-commit-portage.txt. ostree version used is 2020.7, ashift was always 12 for the 4k physical sector size in my case. I've also been playing with different recordsize values 128k, 8k, 4k but to no avail.

Nov 13 '20 18:11 eternalinflation

Here are results from iperf record and strace -o

New OSTree repo: ostree --repo=repo --mode=bare-user-only init Download Linux source code: mkdir tree; cd tree; wget https://github.com/torvalds/linux/archive/v5.10-rc3.tar.gz; tar xzf v5.10-rc3.tar.gz; cd ..

And finally a full run of perf record ostree --repo=repo commit --branch=foo tree/ produces perf.data -> https://cloud.elpuig.xeill.net/index.php/s/D3JCFuoaLdL43oT

And 17 minutes of strace -o strace.log ostree --repo=repo commit --branch=foo tree/ produces strace.log -> https://cloud.elpuig.xeill.net/index.php/s/PiS3wEHTm3oIF7D

Is this useful?

Nov 13 '20 19:11 vcarceler

Just for comparison. The same ostree --repo=repo commit --branch=foo tree/ in a EXT4 fs mounted on a loop device runs in 17 seconds and produces:

perf.data -> https://cloud.elpuig.xeill.net/index.php/s/SqY1HlMQcIfOHYI
strace.log -> https://cloud.elpuig.xeill.net/index.php/s/N1HqCbDyZpgSD5e

With this performance's difference seems difficult to belive that OpenZFS works fine as root filesystem. But we have hundreds of computers (students classrooms and laptops) with OpenZFS as root filesystem and we didn't see any performance problem excluding this. So @dbnicholson, do you have a clue of what is particular to ostree commit from the filesystem?

Nov 14 '20 06:11 vcarceler

What ostree does that doesn't happen with regularity in day to day use is use a lot of hardlinks. It's one of the core tenets of how ostree works. So, if hardlinks are slow on zfs, then everything ostree related will be slow. There are ways to make ostree use copies instead of links at the cost of disk usage, but I don't think any options like that are clearly exposed in flatpak. To confirm that's the case or not, you can try a test with cp -l. But I imagine the information you supplied will allow the zfs developers to determine where the real issue is.

I can't think of anything else ostree really does that would cause such a slowdown. There's filesystem syncs, but that's no different than what any application does that wants to safely handle persistent state like a web browser.

Nov 14 '20 08:11 dbnicholson

I tested cp -l and ln without noticeable performance penalty.

I understand that OSTree use a lot of hardlinks, but it uses hard links in the very first ostree commit? As I understand in the first commit without redundant data there is nothing to share and find repo -links +1 only show directories.

But this first commit performs very bad on OpenZFS. Checkouts with ostree --repo=repo checkout foo tree-checkout/ works very fast.

I also tested git commit to see if there is something in common but it works fine.

Hope zfs developers can determine the cause of such slowdown.

Nov 14 '20 10:11 vcarceler

It shouldn't use hardlinks typically until you check something out. A single commit should not cause any hardlinks.

Nov 14 '20 10:11 alexlarsson

I noticed that in repo/objects all files have date ene 1 1970. Maybe a ostree commit make a lot of changes in files metadata?

But touch on files to change date performs well.

Nov 14 '20 10:11 vcarceler

Yesterday I tried to reproduce with ostree v2019.5 and that behaves slightly different: flatpak install gimp (same example as above) gets stuck the first time only at around 97% during processing "org.gnome.Platform.Locale" whereas with 2020.7 this happens already at around 18-20%. Any significant changes between these versions that might help getting closer to the root cause?

Nov 14 '20 13:11 eternalinflation

Been wondering why flatpak is broken when e.g. trying to install gimp

as far as I remember (if I recall correctly) there's both at least 1 core (80-99% cpu load) (via top command) occupied and constant zio_* activity via iotop.

The progress getting stuck is kind of "random" once it was stuck at ~60% and another one at ~97%

ran into this when trying to install some gaming environment via flatpak

May 16 '21 19:05 kernelOfTruth

I wonder if zfs sync=disabled [dataset] can make a difference.

May 17 '21 12:05 IvanVolosyuk

@IvanVolosyuk I tried your suggestion of disabling synchronous requests, and flatpak is now able to install applications.

Steps taken:

zfs set sync=disabled zroot/ROOT/bootenv/var/lib/flatpak
flatpak install org.videolan.VLC org.videolan.VLC.Plugin.bdj org.videolan.VLC.Plugin.fdkaac org.videolan.VLC.Plugin.makemkv com.makemkv.MakeMKV

May 19 '21 19:05 QORTEC

I wonder if it is faster than ext4 in a zvol or in a loopback (with sync on). This kinda confirmed my suspicion after reading the comments on the bug that OSTree might just try to sync a lot, which is probably not needed on ZFS because of the ordering guarantees.

On Thu, May 20, 2021 at 5:47 AM QORTEC @.***> wrote:

@IvanVolosyuk https://github.com/IvanVolosyuk I tried your suggestion of disabling synchronous requests, and flatpak is now able to install applications.

Steps taken:

zfs set sync=disabled zroot/ROOT/bootenv/var/lib/flatpak

flatpak install org.videolan.VLC org.videolan.VLC.Plugin.bdj org.videolan.VLC.Plugin.fdkaac org.videolan.VLC.Plugin.makemkv com.makemkv.MakeMKV

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/11140#issuecomment-844417596, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXQ6HNLXH3ZB3EKRWQTPRTTOQIWJANCNFSM4TFJVQYA .

May 22 '21 14:05 IvanVolosyuk

Also, I'd like to understand what else I could check on my end, thanks!

It sounds like the first order of business would be to determine exactly what operation on OpenZFS is significantly slower than ext4. From the original post it sounds like the suspicion is that it's hard links. Can you re-reun the flatpak command under strace -c for OpenZFS and ext4 so we can get a histogram of all the system call timings for comparison.

I just hit this locally on a machine with a rpool on an ancient SSD. I observed two things:

zpool iostat 1 had writes pegged at 20MB/sec.
/proc/spl/kstat/zfs/rpool/txgs showed txgs being done absurdly fast.

I set sync=disabled on the dataset and flatpak finished an operation that was projected to take nearly an hour in seconds.

My guess is that flatpak is using O_SYNC. If it writes in 4K chunks (for example) with O_SYNC, then we would see amplification of those writes into the record size, which is 128K by default. This would happen 32 times per 128K record. ext4 would definitely handle this better, as it would just pass through the writes to the disk and not have to deal with any write amplification.

I have not confirmed that flatpak is using O_SYNC (as I hit this while doing something else), but it fits the observations. This is technically an upstream flatpak issue, but there is some further analysis that we can do here. In specific, we need to confirm that flatpak is using O_SYNC and the sizes of the writes.

Jun 16 '21 11:06 ryao

It doesn't look like O_SYNC is used, but rather that its very aggressive with fsync? https://github.com/ostreedev/ostree/blob/5523aee0829d0a4266047b21bab218618f77f46f/src/libostree/ostree-repo-commit.c#L51-L69

Jun 16 '21 13:06 gdevenyi

wow, that use of fsync screams "bad idea" to me. solving a symptom instead of the problem. :(

Jun 16 '21 13:06 bghira

Neither flatpak nor ostree explicitly use O_SYNC to my knowledge. What ostree does is carefully sync objects during pulls and checkouts to ensure that both the repository and the installation are consistent. If you can get the ostree CLI (should be available on most distributions), then I think you can do a reasonable simulation with the commit and checkout builtins.

# Use a throwaway directory for the test
testdir=$(mktemp -d -p /somewhere/on/zfs)
repo="$testdir/repo"
files="$testdir/files"
checkout="$testdir/checkout"
# Setup a bare-user-only repository as flatpak does
ostree --repo="$repo" init --mode=bare-user-only
# Make a reasonable directory of files to commit and checkout
tar -xf something.tar -C "$files"
# Commit them to the test branch. This will copy the objects into the repo and do various syncs.
# Experiment with --fsync=no.
# This is roughly equivalent to the disk IO from the pull part of flatpak install.
ostree --repo="$repo" commit -b test -s test "$files"
# Checkout the commit. This will hardlink the objects from the repo and do various syncs.
# Experiment with --fsync=no
# This is roughly equivalent to the disk IO from the deploy part of flatpak install.
ostree --repo="$repo" checkout test "$checkout"

Jun 16 '21 15:06 dbnicholson

wow, that use of fsync screams "bad idea" to me. solving a symptom instead of the problem. :(

When ostree is used to handle your OS, then carefully syncing every file is something I very much want it to do. I don't want my repo to become corrupted and hence make my system unbootable.

You could make an argument that when you're installing apps with flatpak that maybe you don't care about that level of consistency. In that case you can globally disable fsync at the repo level:

# System repo
sudo ostree --repo=/var/lib/flatpak/repo config set core.fsync false
# User repo
ostree --repo=$HOME/.local/share/flatpak/repo config set core.fsync false

You could also make the argument to the flatpak project that it should create its ostree repos with fsync disabled by default.

You might also ask, if fsync is pointless on zfs, then why isn't fsync a no-op? As far as I know, ostree is syncing files in the recommended and most efficient way if you care about data consistency, so I'm curious what's a bad idea about what it's doing.

Jun 16 '21 15:06 dbnicholson

yes, what are these data consistency issues that result from not running fsync every time a file is copied? what documentation is recommending fsync after every file as the most efficient way to obtain data consistency?

Jun 16 '21 16:06 bghira

hm ... as far as I know write barriers (ext4) and atomicity (from the 'ACID' philosophy - atomicity, consistency, isolation, durability; for databases or in filesystems like reiser4)

barrier=<0|1(*)>	This enables/disables the use of write barriers in
barrier(*)		the jbd code.  barrier=0 disables, barrier=1 enables.
nobarrier		This also requires an IO stack which can support
			barriers, and if jbd gets an error on a barrier
			write, it will disable again with a warning.
			Write barriers enforce proper on-disk ordering
			of journal commits, making volatile disk write caches
			safe to use, at some performance penalty.  If
			your disks are battery-backed in one way or another,
			disabling barriers may safely improve performance.
			The mount options "barrier" and "nobarrier" can
			also be used to enable or disable barriers, for
			consistency with other ext4 mount options.

https://en.wikipedia.org/wiki/ACID https://en.wikipedia.org/wiki/Atomicity_(database_systems)

https://en.wikipedia.org/wiki/Reiser4

Jun 16 '21 17:06 kernelOfTruth

Neither flatpak nor ostree explicitly use O_SYNC to my knowledge. What ostree does is carefully sync objects during pulls and checkouts to ensure that both the repository and the installation are consistent.

I am not sure why I saw symptoms of severe write amplification if that is what it was doing. I don’t have time to look more deeply at the moment.

wow, that use of fsync screams "bad idea" to me. solving a symptom instead of the problem. :(

When ostree is used to handle your OS, then carefully syncing every file is something I very much want it to do. I don't want my repo to become corrupted and hence make my system unbootable.

It is preferable for package managers to use syncfs after writing a large number of files to calling fsync on every file. If there is a crash during this, the package manager would need to deal with it as if it were repeating everything anyway, so using fsync so zealously does not really provide any benefit.

Jun 16 '21 17:06 ryao

What ostree does in the default case when committing is:

Download a bunch of files to a temporary directory
syncfs on the temporary directory
Rename all the objects into the real objects directory. This is content addressable by sha256sum with a 2 level split at after the 2nd character of the checksum. I.e. objects/1f/1482d1df7720a719c9b2a5f62f58db785fbbdef7feb78d0f3a3b1cf495e37e.file is a potential object path. Therefore, there are potentially 256 object subdirectories.
fsync each of the objects subdirectories and the objects directory itself to ensure the renames and potential new subdirectories are on disk.

In the comment pointed to above, it talks about an optional per_object_fsync mode where each individual object is fsyncd. That isn't the default, though.

When checking out, ostree does no syncing on its own by default and flatpak does a single syncfs on the checkout directory.

Jun 16 '21 19:06 dbnicholson

It is preferable for package managers to use syncfs after writing a large number of files to calling fsync on every file. If there is a crash during this, the package manager would need to deal with it as if it were repeating everything anyway, so using fsync so zealously does not really provide any benefit.

This is more or less exactly what it does, like I said half a year ago: https://github.com/openzfs/zfs/issues/11140#issuecomment-726257799. The main difference is that after the syncfs we also fsync() some directories to push directory metadata to disk.

I'm less than impressed with the constant blaming on ostree and speculating what it may do wrong in this issue, with little actual analysis of the issue (or apparently even reading the replies).

Jun 17 '21 07:06 alexlarsson

zfs zfs copied to clipboard

Install flatpak packets in OpenZFS it's too slow. OSTree over OpenZFS it's too slow.

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

zfs
zfs copied to clipboard