btrfs icon indicating copy to clipboard operation
btrfs copied to clipboard

BSoD on shutdown with KERNEL_SECURITY_CHECK_FAILURE (corrupt list entry)

Open Ralino opened this issue 7 years ago • 18 comments
trafficstars

I get on about every second shutdown a BSoD with KERNEL_SECURITY_CHECK_FAILURE (FAST_FAIL_CORRUPT_LIST_ENTRY) on Windows 10. I could not find any direct causes which made the shutdown fail or not, but it seems to fail about half the time.

KERNEL_SECURITY_CHECK_FAILURE (139)
A kernel component has corrupted a critical data structure.  The corruption
could potentially allow a malicious user to gain control of this machine.
Arguments:
Arg1: 0000000000000003, A LIST_ENTRY has been corrupted (i.e. double remove).
Arg2: ffffcb828bc5a440, Address of the trap frame for the exception that caused the bugcheck
Arg3: ffffcb828bc5a398, Address of the exception record for the exception that caused the bugcheck
Arg4: 0000000000000000, Reserved

EXCEPTION_RECORD:  ffffcb828bc5a398 -- (.exr 0xffffcb828bc5a398)
ExceptionAddress: fffff80bfdd10fe5 (btrfs+0x0000000000010fe5)
   ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 0000000000000003
Subcode: 0x3 FAST_FAIL_CORRUPT_LIST_ENTRY

The minidump is attached here. I tried installing the debug version as well, but it does not seem to create a log with DebugLogLevel = 2 and LogFile = \??\C:\btrfs.log

Ralino avatar Mar 12 '18 12:03 Ralino

I can confirm I am seeing a similar issue

intelburn avatar Mar 19 '18 16:03 intelburn

+1, I get it when there's IO activity on the BTRFS partition. OS build 16299.309. There is no obvious pattern to crashes - examples include downloading torrents (qBittorrent), extracting archives with 7zip, etc.

ojura avatar Mar 27 '18 16:03 ojura

Can confirm. Mine is happening during a Syncthing scan of the folders that were created using Manjaro.

intelburn avatar Mar 27 '18 21:03 intelburn

Received once then I created a nfs share on windows 10

ihipster avatar Apr 11 '18 08:04 ihipster

@maharmstone, I admire your work on writing this, but it was quite disappointing to see you locking the thread in #88 and directing a personal attack (calling it a mistake trying to engage with me).

Having said that, I will share my findings and answer @ale5000-git's question from #88. I have tried the latest master (86ca3de), and I am still getting kernel security check BSODs -- for example, opening Explorer, select everything (@ and @home), right-clicking and selecting Properties. After it enumerates everything for a while, I get a kernel security check BSOD. On linux, sudo btrfs check --check-data-csum reports everything is clean, so hardware issues are out of question, as I claimed in #88. And I am certain that over time, WinBtrfs would corrupt my drive again.

Here is a minidump, along with the binaries and the .pdb. btrfs_crash_pdb.zip

Thanks, and bye.

ojura avatar Nov 25 '18 01:11 ojura

Same problem

Guimli avatar Dec 02 '18 09:12 Guimli

I don't use Windows 10 anymore. I'll try 1.2.1 on Windows 7 soon.

Zero3K avatar May 09 '19 12:05 Zero3K

In reference to #88, I just wanted to add that on a linux-only ssd rootfs (on a LUKS-encrypted GPT partition), I recently noticed errors of the following kind: csum failed root 1379 ino 81675 off 8478720 csum 0x323867f1 expected csum 0x98f94189 mirror 1. I don't always get the same expected csum, but this specific csum is also present in his bug report/dmesg listing. I'm still trying to figure out what it means, but I probably need some help doing that, and have to wait on it for the time being.

I can say for sure that a lot of the expected csum's are identical for offsets both (as far as I remember) in individual inodes, as well as (from my memory; if you really need I can look into it again, but it's effort I don't want to waste) amongst some inodes.

It's not always the same, but it seemed like only a few values mad up all the expected csum's.

namibj avatar May 27 '19 21:05 namibj

I fixed something very similar to this in the latest version. Is it still an issue?

maharmstone avatar Nov 24 '19 02:11 maharmstone

@maharmstone (Sorry for the late reply) I'm still seeing checksum errors, but I haven't checked whether they are just historical or not. I'm unfortunately slightly limited in how good I can list errors, because there is a hardcoded ratelimit in the kernel that prevents me from just dumping a list of all these errors to diff over time. I'd have to semi-manually remove almost all offending files before getting the last few shown. I think that's due to some temporal clustering of these errors when running a scrub.

If you have any specific suggestion for debugging this (further), I'd be open to help.

I just don't have much time to spare, especially not time during which I can "do software dev", until early next year.

namibj avatar Nov 27 '19 05:11 namibj

Had the same BSOD as the OP three times in a row while downloading a game from Steam (with a rate of roughly 12MB/s) using WinBtrfs v1.8. The Minidump headers of all three BSODs looks very similar to the one from the OP: https://owncloud.gwdg.de/index.php/s/on5PXohp7Moj50B

Schroedingers-Cat avatar May 09 '22 07:05 Schroedingers-Cat

@Schroedingers-Cat Would mention that it seemed to go away on 1.7.2 once I downgraded.

BluedragonMask avatar May 31 '22 04:05 BluedragonMask

@BluedragonMask thanks a lot! Did you also test v1.7.9, v1.7.8, ... down to v1.7.2 and found v1.7.3 to be the first version with this BSOD?

Schroedingers-Cat avatar Jun 01 '22 08:06 Schroedingers-Cat

I tried downloading around 150GB of games onto my BTRFS drive via Steam with v1.8.0, v1.7.9 and v1.7.3. This took multiple hours and here's the amount of BSODs I got:

  • v1.8.0 = 3 BSODs
  • v1.7.9 = 3 BSODs
  • v1.7.3 = 0 BSODs

I used chocolatey to install these versions and restarted before each new try. So from these very limited tests, it could be that v1.7.3 also is not affected by this issue.

Schroedingers-Cat avatar Jun 01 '22 18:06 Schroedingers-Cat

I suspect that this might be the same bug as https://github.com/maharmstone/btrfs/issues/488, which ought to be fixed by https://github.com/maharmstone/btrfs/commit/76b13080cecf8dad4ba23ca4b6e4b85e2c242dbc. The bug was introduced with v1.7.3, which matches with what's been written above.

maharmstone avatar Jun 13 '22 17:06 maharmstone

@maharmstone sounds great! When will the fix be released?

Schroedingers-Cat avatar Jun 13 '22 19:06 Schroedingers-Cat

@maharmstone any chance for a signed pre-release for testing the fix?

Schroedingers-Cat avatar Jun 18 '22 21:06 Schroedingers-Cat

@maharmstone any news on releasing the fix?

Schroedingers-Cat avatar Jul 06 '22 20:07 Schroedingers-Cat

Closing old issues

maharmstone avatar Nov 30 '23 01:11 maharmstone