mergerfs
mergerfs copied to clipboard
Can't use oflag=direct on mergerfs mount
Describe the bug
dd if=/dev/zero of=testA bs=8M count=1 status=progress oflag=direct
dd: error writing 'testA': Invalid argument
1+0 records in
0+0 records out
0 bytes copied, 0.0165689 s, 0.0 kB/s
Expected behavior
To write file bypassing any dirty pages/writeback since using the direct oflag.
System information:
- OS, kernel version:
5.13.10-200 - mergerfs version:
2.32.6 - mergerfs settings category.action=all,func.rmdir=all,category.create=rand,category.search=ff,func.mkdir=all,func.access=ff,func.getattr=ff,moveonenospc=mfs,minfreespace=128M,async_read=true,cache.writeback=false,cache.readdir=true,cache.files=off,dropcacheonclose=true,cache.entry=10,cache.attr=30,cache.negative_entry=10,xattr=nosys,threads=4,statfs=full,cache.statfs=3,hard_remove,fsname=mergerfs,xattr=noattr,use_ino,allow_other,default_permissions,noauto,noatime,nodev,noexec,nosuid 0 0
Additional context
Ultimately, would like to have mergerfs directly commit any operation to the underlying disks, bypassing any writeback/dirt-pages/cache utilization
- I'm not sure where you got your arguments but there is some stuff there that is not necessary and not listed in the docs. hard_remove and default_permissions are both noops.
- It's not mergerfs who's returning the error. It's the filesystem. But the cause is probably mergerfs in that few if any memory allocations are aligned properly for that to work. Well... libfuse is probably doing it. I need to look into it more.
- I'm not sure where you got your arguments but there is some stuff there that is not necessary and not listed in the docs. hard_remove and default_permissions are both noops.
It looks hairy I know, will cleanup the noops you mentioned really wishes there were some way to state noop args!
- It's not mergerfs who's returning the error. It's the filesystem. But the cause is probably mergerfs in that few if any memory allocations are aligned properly for that to work. Well... libfuse is probably doing it. I need to look into it more.
Highly appreciate that, as I'm really trying to reach a way that is basically using directio when writing.
I'm getting rid of most options anyway in favor of a config file in version 3.X given the increased structure and complexity of the settings. I could print out warnings but most people wouldn't notice given running at boot. I have considered using syslog for certain rare issues that would be useful to log.
The issue was as I expected. The memory used to store the data from the kernel for the write call is just malloc'ed. No alignment. This is libfuse code. I'll have to modify that to properly round up and align the memory allocations (in the least.)
Oh I see, would that be pushed to the git master branch?
Also chapeau for deciding on external logging, it would help a lot in many cases.
Any workaround meanwhile to be able to bypass any sort of caching till the fix is pushed?
I'm not sure I want to "fix" it as I described. It would require copying every single buffer at least twice just to write the data out to the file. I need to look at the options more.
I'm not going to log generally. Only for fatal and non-fatal error conditions. It already logs when invalid arguments are passed in. It just doesn't warn of deprecated values. It doesn't because the whole system is being replaced anyway and it is harmless to set them.
You can try using splice_read,splice_write,splice_move arguments.
Okay, I replied with that cause I had a process that were writing a large file (a Chia plot) that crashed in the middle, taking mergerfs process down with it completely so I thought more logging/visibility would make spotting the issue easier.
Also weird an IO external process can take down mergerfs.
mergerfs is managing thousands of transactions a second under load so logging all that just isn't practical.
I can't comment on the crash without more information. There are no known bugs in mergerfs that would cause it to crash. Doesn't mean there isn't a bug but you'll have to create a reproducible example.
What is your O_DIRECT use case. It is generally the use is highly discouraged. mergerfs already has a number of behaviors that limit caching. cache.files=off disables caching of mergerfs and dropcacheonclose=true forces the kernel to drop cache of files after they are closed.
Using splice_read,splice_write,splice_move did allow for oflag=direct to work and I can now see writing through mergerfs reaches ~60 MB/s while directly to the backend disk reaches ~170 MB/s using an 8MB block size.
Not complaining about performance but interested in knowing why the overhead and if there's something to do about it.
Thanks for the splice_* tip. I shall read more about what each option do.
Great project you have here. Really :)
https://github.com/trapexit/mergerfs#performance
There are lots of things to consider. If you're using O_DIRECT then naturally you'll get worse performance generally and especially if latency increases. Apps almost always use sync IO. If latency increases then throughput decreases. That's normal. mergerfs is a FUSE filesystem. It lives in userspace. There is currently no option but to increase data hops and therefore latency.
Okay, I replied with that cause I had a process that were writing a large file (a Chia plot) that crashed in the middle, taking mergerfs process down with it completely so I thought more logging/visibility would make spotting the issue easier.
Also weird an IO external process can take down mergerfs.
Turns out to be an OOM event and mergerfs were the first to die silently.
mergerfs is using enough memory to be chosen by the oomkiller? That's not common but can happen depending on the usage pattern. There really isn't much that can be done till the oomkiller overhaul that's been going on is all done.
OP should be addressed in 2.36.0