freebsd-src icon indicating copy to clipboard operation
freebsd-src copied to clipboard

Don't reset aac_reset_adapter on timedout

Open mdedetrich opened this issue 4 years ago • 3 comments

Credit for this change goes to SilverNetworks from https://www.truenas.com/community/threads/idle-reboot-bug-aacraid0-command-0xfffffe00007bdfa0-timeout-after-3857-seconds-shutting-down-controller-done.86761/

The premise of the problem is that if for some reason a disk device behind the aacraid driver isn't sending/receiving any IO, then the timedout condition triggers and the call to aac_reset_adapter(sc) causes the FreeBSD system to lock up.

This leads to a lot of unintentional behavior, for example in my personal usecase a hard drive sitting behind this raid controller failed a S.M.A.R.T. check. Since the S.M.A.R.T check failed FreeBSD stopped writing/reading data from the disk (as it should) however this invariably trigger the timedout which then caused the system to hang. You experience the same problem if for whatever other reason you don't write/read from a disk behind the raid controller (i.e. you have some disks sitting in the system and they aren't mounted but still recognized).

mdedetrich avatar Aug 07 '21 16:08 mdedetrich

https://people.freebsd.org/~imp/aacraid.diff has the above instructions reduced to a patch :)

bsdimp avatar Aug 08 '21 18:08 bsdimp

Had anybody tried my alternative to see if that also fixes the problem? It's more consistent with the underlying problem and should solve the issue as well...

bsdimp avatar Sep 08 '21 03:09 bsdimp

I commented in the forum thread, I would like to do this but there is no documentation on how to make a TrueNAS build with a patched kernel so I created a ticket on this (I created https://jira.ixsystems.com/browse/NAS-112209 to improve the documentation).

Once I figure this out I will test it on my systems

mdedetrich avatar Sep 08 '21 05:09 mdedetrich

OK. After reading the thread followups, I'm just going to close this. It's been too long, nobody cares to even try the patch I came upwith.

bsdimp avatar Feb 05 '23 15:02 bsdimp

@bsdimp I am sorry for letting this go on my end. I am honestly not that familiar with TrueNAS/FreeBSD, there wasn't any documentation on how to run a version of TrueNAS with a patched kernel and I tried to create a ticket for it (which was previously mentioned) but as you can see I didn't get any answers and on the forums the responses weren't welcoming (they seem to have the attitude that you shouldn't even be doing this which isn't very constructive when trying test something like this).

Ontop of that I have very little time, I am maintaining a lot of open source projects including a massive non trivial one. Maybe at some point in the future when I have more capacity I can look into it but I apologize for not being able to test it.

mdedetrich avatar Feb 14 '23 08:02 mdedetrich