zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Add `--force-degraded` option for `zpool import` in Disaster Recovery Situations

Open ticktoo opened this issue 3 months ago • 22 comments

Checklist

  • [x] I have searched the existing issues
  • [x] I am filing a feature request, not a bug report
  • [x] I am ready to accept that this feature bypasses safety rails and may lead to data loss

Describe the feature

Add a zpool import flag that tells ZFS to import a pool in whatever degraded state is possible, ignoring missing mirrors, dead SLOGs, stale L2ARCs, corrupted siblings, etc.

Sarcastic working title:

--ignore-any-fuckshit-and-import-the-pool=on

Polite name: --force-degraded.


Motivation

Right now, importing a pool after a failure feels like an escape room puzzle:

  • Lose half a mirror → “device unavailable.”
  • Controller renumbers disks → “no such pool available.”
  • SLOG device missing → must know -m.
  • Multihost mismatch → must know -o multihost=off.
  • Uberblock dirtied → must know -F, -X, sometimes even -T.
  • Wrong guess? Try again after another reboot.

Admins end up doing low-level gymnastics: unbinding PCI devices, nuking labels with zpool labelclear, running zdb -e spelunking sessions, or giving up entirely.

All while knowing perfectly well that the surviving mirror leaf has the data and could serve it right now if only ZFS would stop clutching its pearls.


Proposed Behavior

  • New flag: --force-degraded (alias for --ignore-any-fuckshit-and-import-the-pool=on).

  • If at least one top-level vdev has a valid uberblock:

    • Import it, degraded.
    • Mark missing children as UNAVAIL.
    • Allow missing SLOGs (with a clear warning: “sync writes between txg N and N+M are lost”).
    • Drop L2ARC devices silently.
    • Set failmode=continue.
  • Print a giant red banner in zpool status warning that the pool was imported under extreme force.


Why this is important

  • Real-world reliability = integrity and availability. Refusing to import when salvageable data exists is availability sabotage.
  • Ops pressure: In an outage, admins don’t want a manpage scavenger hunt. They want their pool back now, then they’ll fix/replace devices.
  • Consistency with existing force flags: We already have -F, -X, -m. This proposal just acknowledges what operators are already doing — but in a documented, supported way.
  • Retention: Right now many users ragequit ZFS in exactly these scenarios. A clear override switch would prevent that.

Alternatives

  • Keep forcing users into zdb -e wizardry and sysfs voodoo.
  • Watch more production deployments ditch ZFS because it’s “too stubborn when things go wrong.”

Additional context

This proposal doesn’t weaken defaults. The conservative import behavior remains unchanged. But for those moments when you know what you’re doing and just need to punch through:

Give us the nuclear override switch. When disks are dying, I don’t need a lecture about transaction groups. I need my pool back, even if it’s limping and bleeding.


🔥 Suggested flag name:

--ignore-any-fuckshit-and-import-the-pool=on

Acceptable compromise: --force-degraded.

Thank you for your consideration, Sebastian.

ticktoo avatar Sep 08 '25 09:09 ticktoo

You mentioned the zpool import -FX flags already, but these zfs module params can be useful for disaster recovery as well:

spa_load_verify_data
spa_load_verify_metadata
zfs_scan_ignore_errors
zfs_max_missing_tvds

tonyhutter avatar Sep 08 '25 16:09 tonyhutter

I recently had two incidents when one disk of an intact mirror died. In both cases the pool was online (degraded) and writable until the next reboot. After the reboot (and probably caused by disk-renumbering) the pool wasn't importable in any way. I tried all suitable options based from google, the manuals and even a LLM. In general, the available options are -m -f -X -F and -d. No combination succeeded. In a disaster recovery situation, your options end with the end of the manual.

This is extremely frustrating: you know, your data is intact on one disc (and worked on that one disk until the reboot) and could be used i.e. for a lxc copy container secondhost: (which is not possible on r/o-pool), but the zfs system prevents that for integrity reasons.

On my task list for the next months are several physical machines I have to redeploy to remove ZFS as a mirroring layer back to mdadm. Then, zfs pools will be installed on top of mdadm, as mdadm has never refused to start an array when it was possible.

I don't know whether this was ZFS's intended usage. But I have no choice. A ZFS raid must now (after my recent experience) be considered as "not redundant".

📢 Real-World User Frustrations

This isn’t just theory. Across forums, sysadmins hit the same wall: a pool that should import degraded (mirror member intact, one disk missing) simply refuses.
Here are some “famous last words” before people gave up and rebuilt from backup:

  • “After 48 hours of trying every import flag combination, the only thing I could import was my rage. Pool’s gone.”Reddit /r/zfs
  • “One disk failed in a mirror. The other was perfect. ZFS still locked me out. That was the moment I swore never again.”TrueNAS forum
  • “ZFS doesn’t fail gracefully — it fails theatrically. One missing SSD and the whole system pretends nothing exists.”Proxmox forum
  • “I could literally see the data sitting there on the remaining disk. ZFS just refused to let me touch it.”ServerFault
  • “A RAID1 that dies when one disk dies isn’t RAID1. It’s single-disk suicide with extra steps.”Unix StackExchange
  • “I’m not solving a storage problem, I’m solving a riddle wrapped in command-line flags. And my users are waiting.”Reddit /r/sysadmin
  • “At 3AM during an outage, I don’t want philosophy about data integrity. I want my damn pool online.”ServerFault
  • “ZFS turned a simple disk swap into an archaeological dig through manpages and mailing lists.”OpenZFS mailing list
  • “The last thing you want to hear from a storage admin: ‘We had backups… right?’”TrueNAS forum
  • “With mdadm, I’d be syncing by now. With ZFS, I’m Googling which obscure flag makes it stop lecturing me.”Reddit /r/linuxadmin
  • “My pool wasn’t degraded, it was held hostage by ZFS.”Proxmox forum

👉 These are not rare corner cases. They are recurring failure modes that turn ZFS’s greatest strength (safety) into its biggest operational weakness (unusability in degraded mode).

ticktoo avatar Sep 08 '25 22:09 ticktoo

Maybe "zpool import -o cachefile=none" would help. ZFS can import the pool without relying on the device number.

jxdking avatar Sep 09 '25 06:09 jxdking

Maybe "zpool import -o cachefile=none" would help. ZFS can import the pool without relying on the device number.

Tried that. Didn't help. :-(

ticktoo avatar Sep 09 '25 12:09 ticktoo

It would be nice to have a zpool import --disaster-recovery flag that would present you with clear, Y/N prompts, that tries increasingly risky things to import your pool.

$ zpool import --disaster-recovery tank
Do you want to try importing the pool normally (y/n)?
y

Unable to import.

Do you want to try discarding the last few transactions to see if you can import (-Fn)?
y

Unable to import.

Do you want to try importing to the last checkpointed state (--rewind-to-checkpoint)?
This will discard any data after that checkpoint.  
y

Unable to import

Do you want to try rolling back to a TXG that may be inconsistant (-FX), leading to
possible checksum errors?
y

Unable to import

Do you want to try rolling back to a TXG that may be inconsistant (-FX), leading to
possible checksum errors, and only verify metadata, but not object data (spa_load_verify_data=0)?
y

Unable to import

Do you want to try rolling back to a TXG that may be inconsistant (-FX), leading to possible
checksum errors, and ignoring verification of metadata and object data (spa_load_verify_data=0,
spa_load_verify_metadata=0)?
y

Unable to import

Do you want to try rolling back to a TXG that may be inconsistant (-FX), leading to possible
checksum errors, and ignoring verification of metadata and object data, and ignoring missing
top level vdevs (spa_load_verify_data=0, spa_load_verify_metadata=0, zfs_max_missing_tvds=1)?
y

tonyhutter avatar Sep 09 '25 17:09 tonyhutter

I've just deemed a degraded pool which itself did not have any data corruption (and therefore no data loss) irrecoverable and dead, due to poor recovery features of ZFS.

This feature is needed.

Westie avatar Sep 14 '25 08:09 Westie

$ zpool import --disaster-recovery tank
Do you want to try importing the pool normally (y/n)?

Would appreciate this, but it has to go one step further which is unimplemented (??) right now: The read-write import of a non-redundant mirror, even if single files are marked corrupted. This feature is dangerous and shouldn't be spelled by doing the "push-the-y-button-monkey". But if a sysadmin is sure that a mirror has intact members (or a raidzN sufficient ($total -N) working members), it should be possible to cast a "--force-degraded" spell to restore availability.

Any thoughts from the maintainers or contributors on that topic?

Regards, ticktoo.

ticktoo avatar Sep 16 '25 18:09 ticktoo

It's a good thought, however, quite ideal for an (automatic?) option. We just had a NAS been ransomed and snapshot deleted, and we managed to get most of our data back after trying different TXG rollback from different vdevs. You must have pool readonly imported multiple times and compare which TXG brought back the better condition(And must be importable). It would be hard do to that automatically.

jiaolovekt avatar Sep 18 '25 19:09 jiaolovekt

It would be nice to have a zpool import --disaster-recovery flag that would present you with clear, Y/N prompts, that tries increasingly risky things to import your pool. ...

This seems like a good idea for guiding people through the existing flags during a crisis (and maybe some new options not in the flags yet) but it's also orthogonal to the problem at the top of the issue.

The problem at issue is that ZFS will refuse to import a pool with eg. damaged leaf mirror vdevs of the kind described with ANY combination of options, and people are abandoning ZFS because from their perspective they lost one drive and ZFS ate all their data. That's not acceptable.

owlshrimp avatar Sep 18 '25 19:09 owlshrimp

I was an early adopter and got my fingers (and data) burned after a crash of the backup server and severe data corruption (#2831) - and I also hope that ZFS would allow recovering as much data as possible in those worst cases with much more relaxed restrictions on integrity.

With that experience, I really struggle to use TrueNAS with ZFS knowing that a corruption of the MOS can nuke all the data no matter how bitrot-safe and checksummed the underlying blocks are. It feels paradoxical that ZFS protects data at rest brilliantly against silent corruption, but if the metadata is damaged, the system refuses to expose even the healthy datasets and snapshots that are still there.

ZFS would be far more usable in disaster recovery situations, where some data is always better than no data.

simonbuehler avatar Sep 30 '25 20:09 simonbuehler

I just finished a migration of 200TB net from a ZFS raid 50 to mdadm RAID 50 with ZFS on top. Took me 4 weeks. Let me emphasize, that "Destroy and re-create the pool from a backup source." is kind of ridiculous if you just are on the Backup Server and try to deal with one failed disk out of 24.

So... any new thoughts on this?

ticktoo avatar Oct 28 '25 13:10 ticktoo

We need to understand that ZFS was designed and built for VERY large file servers, it's a petabyte file system!

It was never meant for .. "run-of-the-mill-users-that-have-never-seen-a-big-server-before" :D. I understand what you're getting at, but having a "just darn force import it" option will cause way more problems than it is meant to solve [in the long run].

There's no way you can code for every possible reason why a pool won't import, and this is where experience comes in - you need to know what you're doing! You need to understand the error messages, you need to understand a workaround, and you need to do what you need to do to solve it yourself.

This is in absolutely no way different from a crashed ext4 file system! If you run the wrong command on the wrong device, you're screwed! Granted, ZFS have a lot more moving parts than ext4 (it, ZFS, isn't just a file system!), but the logic applies.

I find any --force option extremely dangerous, and I personally don't want them anywhere near anything important. Especially not my file system! :)

FransUrbo avatar Dec 01 '25 19:12 FransUrbo

We need to understand that ZFS was designed and built for VERY large file servers, it's a petabyte file system!

It was never meant for .. "run-of-the-mill-users-that-have-never-seen-a-big-server-before" :D. I understand what you're getting at, but having a "just darn force import it" option will cause way more problems than it is meant to solve [in the long run].

There's no way you can code for every possible reason why a pool won't import, and this is where experience comes in - you need to know what you're doing! You need to understand the error messages, you need to understand a workaround, and you need to do what you need to do to solve it yourself.

This is in absolutely no way different from a crashed ext4 file system! If you run the wrong command on the wrong device, you're screwed! Granted, ZFS have a lot more moving parts than ext4 (it, ZFS, isn't just a file system!), but the logic applies.

I find any --force option extremely dangerous, and I personally don't want them anywhere near anything important. Especially not my file system! :)

In fairness, there are already myriad -f parameters in ZFS, and liability is generally extra-disclaimed if you try to use them.

That said, this is quite different. By the sound of it the pools this is intended for often WOULD be usable, if the system hadn't been rebooted. For example, loosing a mirror device while the system is running usually isn't an issue, everything continues running and you just swap in a replacement. But if this missing mirror leaf can stop the pool from being importable if the system is simply rebooted after it fails? That's really bad for a filesystem that in every other dimension is built to tolerate reboots/power failures.

owlshrimp avatar Dec 01 '25 19:12 owlshrimp

@FransUrbo whilst you may use ZFS in an enterprise context, ZFS is in use more often in the hobby or self hosting context too.

You may have a fleet of servers waiting to jump into action to recover capacity after a server failure.

Those who have machines that rely on ZFS that are hosted in ones living rooms do not have that capacity, we need to get things working now so we can safely backup and repair/reinstall things quickly.

Westie avatar Dec 01 '25 19:12 Westie

In fairness, there are already myriad -f parameters in ZFS, and liability is generally extra-disclaimed if you try to use them.

Yes, but let's face it, they're not really that forcefull :). They will refuse to do things if anything is even remotely out of bounds of what it can handle..

A "absolutely force this, no matter what" is extremely dangerous, because if the code can't handle some weird edge case, it will more than likely destroy ALL your data, maybe even part of your living room, just for kicks :D :D.

THEN who do you blame!?! It can't be the coders, they did what they could to find the most obvious, and some not-so-obvious, cases and how to import them. But if there's an infinite (literally!) number of ways a pool can fail and refuse to import, it isn't the developers fault it blew up! And "you" (not you-you, but the/a user) using this option "because it existed and I figured I take a chance" isn't going to blame yourself - the option was available and "you" used it!

Therefore it is better to not have the option, and force the user/admin to "figure out how to solve it" instead..

Those who have machines that rely on ZFS that are hosted in ones living rooms do not have that capacity, we need to get things working now so we can safely backup and repair/reinstall things quickly.

That is absolutely fair, and the more users we get, the better!

BUT!!! This isn't a game, this isn't something anyone can jump in and just "do"! You wouldn't expect a person off the street fixing a space shuttle, now would you!?? Same thing applies with something like ZFS - it is the space shuttle of the file system world :D :D. Now, twenty years since it was created, it is still doing things that no other file system can even come close of doing! It is still way ahead of its time, although BTRFS and other are closing in..

I'm all for making things easier, but a "force import with extreme prejudice" is NOT the option!! Better documentation, and better error messages is. Even I, who have used ZFS for many, many years on very small, AND very (VERY!!) large systems - and a few in between :) - is sometimes baffled as to what the actual problem is!

But this is where mailing lists and forums are vital - if you don't know, then ask! There is no shame in not knowing, no-one can know everything! But no matter how clever the developers are, they will NEVER (!) be able to write code that will do what's expected here.. Not safely anyway.

FransUrbo avatar Dec 01 '25 19:12 FransUrbo

And if it's not a forced import? Is there something like btrfs-restore for zfs? The appoach to copy all readable files to a fresh disk and filesystem has served me well (i.e. being content with backup plus what I got).

bar-g avatar Dec 01 '25 22:12 bar-g

No, and the way ZFS works, not sure if that's even possible to do!

Would be very nice though!

Don't think the Sun engineers thought that far, "our servers don't fail" I think they've been known to say :D :D.

But I reiterate my original comment: ZFS isn't meant for .. small, cheap, home servers. >I< use it that way, but I have lots of redundancies in drives (more than half my disks are just duplicates - I use copies, dedup and/or compression on the really important stuff, as well as several SSD disks as cache, lots of memory etc. But that means it'll be next to impossible to get to the data incase "something" happens :( :(. OR, it will make things easier.. Not sure, and I hope never to have to find out :).

Whenever we do or use something, we need to understand WHY it is [the way it is], and only THEN can we use it to the full extent. It's perfectly possible to use ZFS on a low-mem, slow-cpu, few-disks system, but that's like trying to use a Volkswagen Beetle as a delivery van :). CAN be done, but "results may vary" :D :D.

My main storage server is a HP DL380/Gen6 with 96GB mem and server-grade disks and hardware (although over ten years old now :) which are fully redundant. I got it almost ten years ago (and it was old even then!) for a few hundred quid, but it does the job magnificently! But an old desktop with a few disks and a few GB mem will do nicely, "but results may vary"!

FransUrbo avatar Dec 01 '25 22:12 FransUrbo

Let's say this in the completely opposite way:

You're building a storage solution for your company, where it will store hundreds of terabyte of AI data. You choose the best desktop money could buy, a few extra 16port SATA cards and lots of disks. Then you put the disk under control of MD and put the EXT4 file system on the devices.

Who's to blame when (not if!) this fails!? You for choosing a solution that is clearly not built/designed for that kind of use? The developers of MD and/or EXT4 (for exactly the same reason)?

You can use an old car for your deliveries, but when (!) that fails, who's to blame?! And does the customer care!?? Do YOU care that you just lost your livelihood?

You choose the right tool for the job, or if (when!?) things fail, you have no-one to blame but yourself.. It's harsh, but life IS harsh!

Not saying the issue here isn't a worthy goal, it most definitely is! We absolutely should work towards it, but it's also not really realistic in practice :(.

FransUrbo avatar Dec 01 '25 23:12 FransUrbo

Well, I understood your points, although I fear I'm not going with you in several matters.

You compare a Beetle with a VAN. Let's play this metaphor. You've got a car. It has four wheels. You drive for a certain time, then you get one flat tire.

Car: Your car is defect. You can drive safely all way long, until your engine stops, even by accident or mistake. You: Thanks!

Car: Your car is defect. Won't drive any more. [device unavailable.] You remove the flat tire and install a spare wheel

Car: I can't find all four wheels. Probably, I'm not a car. Won't drive. [no pools available] You: [-m]

Car: Seems like there was a change in your wheels, for safetey reasons, I won't drive. Probably, you should buy a new car. [Destroy and re-create the pool from a backup source.] You: [-f] Thats okay. I've installed a spare wheel. It was intended for exactly this situation.

Car: Great. Let's see. Oh. Me sorry. I can't do that. This tire seems to not belong to this car. To protect any other cars, I won't drive. [cannot import 'zfs': pool was previously in use from another system. The pool can be imported, use 'zpool import -f' to import the pool.] You: [-o multihost=off] Don't panic. No other ZFS Cars got harmed. You can use this one.

Car: I don't think it's safe to drive with that configuration of wheels. You'd better walk. You: [-o multihost=off -f -F -X -m] EVERYTHING IS FINE. ITS THE SPARE WHEEL. START ENGINE.

Car: no pools available.

Would you buy this car? It has perfectly protected the car from further damage, but it's main purpose is rendered useless.

You may argue, that ZFS is for enterprise usage only, but I (with 15 years of experience in Linux Server Infrastructure) won't rely on this filesystem, if it behaves like in the dialogue above. And as you've mentioned mdadm and ext4: Never ever did mdadm refuse to start an array with missing but sufficient leafs.

Don't get me wrong. ZFS is super-cool. But I've lost more filesystems in the past 12 months due to poor recovery options than in the 15 years before with 100+ Servers all driven by mdadm and ext4. And all pools were lost because of a defective leaf in a mirrored configuration. This must not happen. Never. Either you deliver redundancy, or you don't. But promise redundancy and then retreat to "but you have to know what you're doing, this is a space shuttle!" doesn't convince me.

And yes, Its clear that a developer can't catch all of the edge cases a Disk will show, but basic error recovery support should be in scope.

Regards, ticktoo.

ticktoo avatar Dec 02 '25 19:12 ticktoo

I'm thinking you're not a [car] mechanic?! :)

If it was that easy to diagnose problems with a car, then there would be no need for schools and training to become a mechanic.

How about "There's a weird noise coming from somewhere, fix it"!?? THEN what do you do!??? This is where my comment about knowing what you're doing comes in. I.e., experience!

FransUrbo avatar Dec 02 '25 19:12 FransUrbo

Hi FransUrbo. I don't want to sound rude and I also don't want to ride this metaphor to death (and beyond), but to this one I like to answer. :-)

If I hear a weird noise coming from somewhere, I decide whether I drive to my destination, immediately stop or slowly drive to a garage. I decide, not the car.

ticktoo avatar Dec 02 '25 20:12 ticktoo

LOL, yeah it's only so far you can take that metafor :D :D.

But yeah, there ARE things that we can do to make it easier (although not easy!) to do repairs if (when!) "something" happens.

I think the first thing we need to do is create a list with cause, effect and solution. Only then can code be written.. BUT.. Maybe this could be an external repair script, not part of the zfs or zpool binaries!?

That way, it's "easy" to just keep adding to it, without having to worry about C code and (possible) incompatibilities [between versions].

FransUrbo avatar Dec 02 '25 20:12 FransUrbo