buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

allows setting volatile overlay mount option

Open marcosnils opened this issue 1 year ago • 5 comments

given that buildkit is generally very IO intensive, some preliminary tests have shown that adding the volatile option could yield up to a 15% performance improvements in some cases.

ref: https://docs.kernel.org/filesystems/overlayfs.html#volatile-mount

Signed-off-by: Marcos Lilljedahl [email protected]

marcosnils avatar Sep 29 '24 13:09 marcosnils

Bit confused. What sets the volatile option for these mounts? Additionally it looks like containerd libs have some code to intentionally remove it before mounting atm. https://github.com/containerd/containerd/blob/v1.7.22/mount/temp.go#L66

Do we capture fsync error on unmount for this? If upperdir fails to sync, what's the point where the error would appear?

tonistiigi avatar Sep 30 '24 02:09 tonistiigi

What sets the volatile option for these mounts?

You can set this in containerd via snapshotter options.

  [plugins."io.containerd.snapshotter.v1.overlayfs"]
    root_path = ""
    upperdir_label = false
    mount_options = ["volatile"]

Additionally it looks like containerd libs have some code to intentionally remove it before mounting atm. https://github.com/containerd/containerd/blob/v1.7.22/mount/temp.go#L66

IIUC that's only for temp mounts, I see that WriteUpperDir uses those https://github.com/moby/buildkit/blob/master/util/overlay/overlay_linux.go#L115. Having said that, when running some black-box performance tests with Dagger. I was getting consistent ~+15% performance increase in build times when using the volatile option. Could it be possible that some other non-temp overlay mounts could be benefited from this setting?

I could have also been bad measure from the benchmark even though I ran the same test multiple times and took the p99 value out of all samples. I'll try re-running the tests again to see if I still get the same results.

marcosnils avatar Oct 01 '24 06:10 marcosnils

You can set this in containerd via snapshotter options.

If we think this is safe(for buildkit's use-cases) then we should add it to all our overlay mounts, not only with containerd worker, and not requiring custom config from the user.

Still, this would need some clarification:

Do we capture fsync error on unmount for this? If upperdir fails to sync, what's the point where the error would appear?

tonistiigi avatar Oct 01 '24 20:10 tonistiigi

If we think this is safe(for buildkit's use-cases) then we should add it to all our overlay mounts, not only with containerd worker, and not requiring custom config from the user.

The scope of this change is much smaller and just allows Dagger to even try this in the first place without a full fork of buildkit. All it's doing is stopping one part of code that's looking for upperdir/lowerdir in buildkit from erroring out if it sees volatile in the list of options. The change is not turning volatile on anywhere, just removing an error from happening if a mount happens to have it.

Do we capture fsync error on unmount for this? If upperdir fails to sync, what's the point where the error would appear?

From the kernel docs Marcos linked to:

If any writeback error occurs on the upperdir’s filesystem after a volatile mount takes place, all sync functions will return an error. Once this condition is reached, the filesystem will not recover, and every subsequent sync call will return an error, even if the upperdir has not experience a new error since the last sync call.

sipsma avatar Oct 02 '24 20:10 sipsma

@sipsma I get that but I much more like that we implement a feature in buildkit instead of half-complete updates that only work if library is called in specific context and can easily break as there isn't anything in BuildKit actually using it. If you want to split work up into multiple PRs and create follow-up issues then that fine if properly documented this way.

From the kernel docs Marcos linked to:

Do I understand correctly from this that in order for volatile option to be safe for buildkit we need to manually call fsync before we call umount for these mounts as that is the place where we can capture the error if some writes were broken. This shouldn't affect performance as afaik umount implicitly calls fsync anyway (but wouldn't let us capture the error).

tonistiigi avatar Oct 03 '24 17:10 tonistiigi