btrfs
btrfs copied to clipboard
Metadata flush scales poorly on very fast drives
This isn't really meant as a bug report to be fixed - rather a warning for people to appreciate the warning of the author that this is literally "not production ready". It's probably fine to access data from Windows on a volume mainly used by Linux.
Regarding the freezes - some will resolve if you let it run long enough - for example, copying large folders can result in freezes that take 10 minutes or more, and afterwards everything is fine. Some freezes will require a restart. I never saw any form of data loss - though, unless you count software (and games) that crashed due to not being able to access the drive. I mainly tested with Git repositories and Steam game folders. Both allow to check file integrity, and the nature of it makes data loss have little impact.
If you use WinBtrfs for important data, you are dumb.
If you could provide a specific example of how to reproduce such freezes - like a certain action on a certain Git repository - it'd be appreciated. I've not witnessed anything similar to what you've described.
Also what compression you're using, if any, and if you're using RAID or not.
Literally half an hour ago I restarted my computer because it was freezing Explorer.exe when I tried to look into the Steam volume that I was moving games away from - which also crashed the Steam client. I think it is going to be very hard to make these issues reproducible. I wouldn't even know where to start. There could be multiple issues, even some timing issues.
Regarding compression and volume: it's 4 NVMe/U.2 1TB Samsung SSDs connected to a HighPoint SSD7120 HBA, configured as RAID0 within the HighPoint HBA controller (HW RAID). This might be relevant as there might a) be some interaction between the HBA and WinBtrfs and b) a timing issue because that configuration can easily provide upwards of 7GB/s read speeds. I have no compression enabled, but COW is disabled for my Steam sub-volume. COW is enabled for the Git sub-volume. For the Git volume, the big freezes were when moving folders with thousands of files. Explorer would just be frozen for 15 minutes and more on the last file.
I also included some performance comparison - first is Btrfs, second is NTFS. They're pretty close regarding writes, reads are a bit off for WinBtrfs.
Btrfs:
NTFS:
Unfortunately I can't put any more time with this. It was a nice experiment, but on the Windows-side I have all these freezes, and on the Linux side, Btrfs support is actually quite poor. For example, Cockpit is not able to manage these volumes in any shape or form.
Keep in mind that Explorer handles folders of lots of files very poorly and will actually freeze or act goofy even on a standard NTFS drive, so using the other freezes as the test cases and not Explorer might be the best bet
Thanks @graealex. You're using hardware that's about 40 times faster than I have access to, which is presumably why I've never seen this... My guess would be that when the driver is doing its metadata flush every 30 seconds, it's not scaling well enough for the colossal amount of files you've been able to create in that time.
You might have better luck if you reduce FlushInterval
in the Registry to something like 10 or so.
For one, the freezes with Steam were permanent. I intentionally left it running for quite some time, but neither the game nor Windows itself would ever recover. Basically every process that tried to access the drive would just lock up until I rebooted. The defining quality of these lock ups is whether killing Explorer.exe solves the issue. For these freezes, it wasn't possible. It was like accessing a HDD that has bad sectors and the whole OS would stutter and lock up eventually.
The other thing is that running a game doesn't really write a lot of files on the volume, so the theory of too many files getting written in too short of a time is clearly not the answer.
Getting similar speeds for testing is possible with a RAM disk. It can actually be a bit slower, probably because of driver limitations: (I am using SoftPerfect RAM Disk 3.4.8, which is freeware - although other RAM disk drivers performed worse)
Again, I have no idea on how to make the issue reproducible or debuggable.
Getting similar speeds for testing is possible with a RAM disk.
Yes, this is what I plan to do, when I get the chance.
@maharmstone I think I run into a similar issue, when cpoying large video files. I found a way to reproduce this easily on my test machine by using fio. And the effect seems to be related to the FlushInterval.
When running the fio benchmark a while 3 to 5 minutes, you can see in the task manager, that the disk active time drops to zero and one cpu core is fully used by the "System" process. The longer the benchmark runs the longer time the drops take.
Interesting thing though is, that the time the data is written to disk is nearly as long as the Flushinterval value set in the registry. I've tested different intervals from 15 to 60 seconds. This effect can not be seen with NTFS or under Linux on the same machine. And the effect can be seen on single btrfs drives and on RAID0 drives. Other RAID levels haven't been tested, as I only have two drives I can use for testing. From what I've seen, I would guess this would also happen on SATA SSDs. But I don't have a SATA SSD at hand.
Below is a screenshot of the taskmanager and the fio script for windows. Flushinterval was set to 30 in this example. If you need more information please let me know!
Cheers,
Chris
`[global] name=fio-seq-write directory=E:\fio-tests rw=write bs=256K direct=1 numjobs=8 loops=1 fsync_on_close=1 group_reporting thread time_based runtime=15000 rate=1m,110m
[file1] size=230G ioengine=windowsaio iodepth=1`
Thanks for this - I'll have a play with fio and see what I come up with.