btrfs
btrfs copied to clipboard
BSoD on shutdown with KERNEL_SECURITY_CHECK_FAILURE (corrupt list entry)
I get on about every second shutdown a BSoD with KERNEL_SECURITY_CHECK_FAILURE (FAST_FAIL_CORRUPT_LIST_ENTRY) on Windows 10. I could not find any direct causes which made the shutdown fail or not, but it seems to fail about half the time.
KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure. The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 0000000000000003, A LIST_ENTRY has been corrupted (i.e. double remove).
Arg2: ffffcb828bc5a440, Address of the trap frame for the exception that caused the bugcheck
Arg3: ffffcb828bc5a398, Address of the exception record for the exception that caused the bugcheck
Arg4: 0000000000000000, Reserved
EXCEPTION_RECORD: ffffcb828bc5a398 -- (.exr 0xffffcb828bc5a398)
ExceptionAddress: fffff80bfdd10fe5 (btrfs+0x0000000000010fe5)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 0000000000000003
Subcode: 0x3 FAST_FAIL_CORRUPT_LIST_ENTRY
The minidump is attached here.
I tried installing the debug version as well, but it does not seem to create a log with DebugLogLevel = 2 and LogFile = \??\C:\btrfs.log
I can confirm I am seeing a similar issue
+1, I get it when there's IO activity on the BTRFS partition. OS build 16299.309. There is no obvious pattern to crashes - examples include downloading torrents (qBittorrent), extracting archives with 7zip, etc.
Can confirm. Mine is happening during a Syncthing scan of the folders that were created using Manjaro.
Received once then I created a nfs share on windows 10
@maharmstone, I admire your work on writing this, but it was quite disappointing to see you locking the thread in #88 and directing a personal attack (calling it a mistake trying to engage with me).
Having said that, I will share my findings and answer @ale5000-git's question from #88. I have tried the latest master (86ca3de), and I am still getting kernel security check BSODs -- for example, opening Explorer, select everything (@ and @home), right-clicking and selecting Properties. After it enumerates everything for a while, I get a kernel security check BSOD. On linux, sudo btrfs check --check-data-csum reports everything is clean, so hardware issues are out of question, as I claimed in #88. And I am certain that over time, WinBtrfs would corrupt my drive again.
Here is a minidump, along with the binaries and the .pdb. btrfs_crash_pdb.zip
Thanks, and bye.
Same problem
I don't use Windows 10 anymore. I'll try 1.2.1 on Windows 7 soon.
In reference to #88, I just wanted to add that on a linux-only ssd rootfs (on a LUKS-encrypted GPT partition), I recently noticed errors of the following kind:
csum failed root 1379 ino 81675 off 8478720 csum 0x323867f1 expected csum 0x98f94189 mirror 1.
I don't always get the same expected csum, but this specific csum is also present in his bug report/dmesg listing.
I'm still trying to figure out what it means, but I probably need some help doing that, and have to wait on it for the time being.
I can say for sure that a lot of the expected csum's are identical for offsets both (as far as I remember) in individual inodes, as well as (from my memory; if you really need I can look into it again, but it's effort I don't want to waste) amongst some inodes.
It's not always the same, but it seemed like only a few values mad up all the expected csum's.
I fixed something very similar to this in the latest version. Is it still an issue?
@maharmstone (Sorry for the late reply) I'm still seeing checksum errors, but I haven't checked whether they are just historical or not. I'm unfortunately slightly limited in how good I can list errors, because there is a hardcoded ratelimit in the kernel that prevents me from just dumping a list of all these errors to diff over time. I'd have to semi-manually remove almost all offending files before getting the last few shown. I think that's due to some temporal clustering of these errors when running a scrub.
If you have any specific suggestion for debugging this (further), I'd be open to help.
I just don't have much time to spare, especially not time during which I can "do software dev", until early next year.
Had the same BSOD as the OP three times in a row while downloading a game from Steam (with a rate of roughly 12MB/s) using WinBtrfs v1.8. The Minidump headers of all three BSODs looks very similar to the one from the OP: https://owncloud.gwdg.de/index.php/s/on5PXohp7Moj50B
@Schroedingers-Cat Would mention that it seemed to go away on 1.7.2 once I downgraded.
@BluedragonMask thanks a lot! Did you also test v1.7.9, v1.7.8, ... down to v1.7.2 and found v1.7.3 to be the first version with this BSOD?
I tried downloading around 150GB of games onto my BTRFS drive via Steam with v1.8.0, v1.7.9 and v1.7.3. This took multiple hours and here's the amount of BSODs I got:
- v1.8.0 = 3 BSODs
- v1.7.9 = 3 BSODs
- v1.7.3 = 0 BSODs
I used chocolatey to install these versions and restarted before each new try. So from these very limited tests, it could be that v1.7.3 also is not affected by this issue.
I suspect that this might be the same bug as https://github.com/maharmstone/btrfs/issues/488, which ought to be fixed by https://github.com/maharmstone/btrfs/commit/76b13080cecf8dad4ba23ca4b6e4b85e2c242dbc. The bug was introduced with v1.7.3, which matches with what's been written above.
@maharmstone sounds great! When will the fix be released?
@maharmstone any chance for a signed pre-release for testing the fix?
@maharmstone any news on releasing the fix?
Closing old issues