btrfs-progs Improve documentation around userspace tool behaviour and safety

Hi there,

On the mailing list for the past few days I have been shouting into the void about some issues I encountered trying to deal with a hardware failure on a btrfs raid-1 filesystem. In my last mail I made several recommendations, and while I won’t be able to become familiar enough with the internals of the btrfs tools themselves to send patches for some of them, but I am at least capable of writing a few paragraphs of documentation. So, here are some patches for that.

In the scrub documentation change I made, I am not sure if there are other edge cases beyond the split-brain one I document where running scrub would be dangerous. I adapted the line about its safety from the comment at https://github.com/kdave/btrfs-progs/issues/816#issuecomment-2166698048. I think that comment probably contains more useful information that could be integrated into the documentation too, but I am currently out of energy to make additional changes.

I also think I may be documenting some bugs here, like that btrfs-check looks at metadata from some place other than the specified block device which seems like it should not be doing since it does not take a filesystem path, but at least I would prefer the current behaviour to be documented than to have one other person be shocked when they see errors that go away after a read-write scrub.

I hope these changes are helpful. Please feel free to change or adapt as necessary, I do not anticipate focusing much more on this right now because I have other obligations.

Thank you for writing and maintaining btrfs and I hope you have a great week!

Aug 06 '24 00:08 csnover

The latest version looks good to me.

I'll still leave the PR open for any extra feedback, until it got merged.

Aug 09 '24 00:08 adam900710

Why is this not merged yet?

Sep 17 '24 16:09 csnover

This is conflicting with the latest devel branch.

Mind to rebase the branch to the latest devel?

Oct 20 '24 21:10 adam900710

This PR was broken by whatever happened at https://github.com/kdave/btrfs-progs/pull/863#commits-pushed-bc39df6 that caused extra stuff to be added to the branch by someone else. Per your request I have git rebase --skip over these and force pushed.

Oct 20 '24 21:10 csnover

CI is failing because base branch is broken, not due to changes in this PR.

Oct 20 '24 22:10 csnover

Although I have merged this series, I'm going to do a lot of updates on the words and of-course add SoB lines to all involved patches, re-order patches (some patches are fixing things of previous one, which should not happen for out-of-tree patches).

I'm a little too optimistic of this merge, and github web is not a good way to review patches, it hides the SoB line of the author, thus makes it harder to find a missing one.

I'll let you informed when the devel branch is updated so you can verify if the modification is fine.

Nov 04 '24 05:11 adam900710

The branch is now updated (devel branch). I have done the following modifications:

Add an SoB line for all patches I'm using <[email protected]> as your mail address. If you want to use a specified address, please let me know.
Merge all scrub related commits into a single one
Remove btrfs-check related warnings As I explained, btrfs-check is designed to choose the good copy (for metadata at least) just like kernel. So the warnings that bad mirror will cause false alerts is at least not the expected behavior
Explain scrub in more details It's done on a per-device basis, and full fs scrub is done by scrubbing each device in parallel.

Furthermore the behavior of scrub repair (using a good mirror) is also done for all the metadata/data reads, There is really nothing that special about scrub, thus a lot of recommendation to run btrfs scrub before btrfs-check is removed.
Remove the pid based load balance patch It's at the bad timing where we're also implementing other read policies. I'd prefer to remove all the pid based read balance mention for now.

Nov 04 '24 06:11 adam900710

To get the easier administrative part out of the way: I didn’t actually sign off on whatever was changed, so you probably shouldn’t list me in such a field, but I also don’t really care, so using that email address is totally fine if you need the field to be there to get these changes published.

Regarding the other feedback: I’m starting to wonder if there is an irreconcilable difference in philosophy here, but I will try advocating for my position one last time in the hope of at least achieving a mutual understanding of why I don’t think the outright removal of some of the notes I added is OK.

First, please know that it is not my intent to attack anyone or their work personally. I understand the many limitations caused by the under-resourced and voluntary nature of OSS work. If my feedback ever feels overly harsh or critical, it is because of my conviction that this is extremely important to get across, and so I am speaking very directly.

So: What is the purpose of end-user documentation? Is it supposed to help users to understand and operate the system safely as it actually is, or is it supposed to be a white paper that describes an idealised future version of the system? I believe it to be the former. The feedback I’m receiving on this ticket suggests a belief that it is closer to the latter.

Just because something will be changed in the future doesn’t mean its current or past behaviour should be left undocumented. btrfs is not an evergreen SaaS. People will be using the old versions for years and they need to know how those old versions work too. If the behaviour changes, then the documentation changes to read “in versions before N”. Descriptions of old behaviour can be moved to an “obsolete” page when N-1 is not still in supported LTS releases, but retaining such information somewhere official is necessary to clarify whatever outdated information is still floating out there on the internet.

Similarly, if the purpose is to help understanding and safe operation, then just because something is a bug doesn’t mean it should not be documented. There is even an industry term for this: errata. Even when a bug is self-evident (e.g. a crash), it still warrants documenting if there is a known workaround. In the case of btrfs check returning false positives, there are two reasons why I find it more critical to document than a run-of-the-mill defect:

btrfs check appears to be working when it is not, thus it is not self-evident there is a bug causing false positives; and
The need for clarity is heightened in high-stress situations like potential data loss/disaster recovery scenarios.

Whether the behaviour is intended is irrelevant to what the end user needs to know to understand and operate the system, and such information needs to be written down somewhere, in a place where end-users will think to look, which is official and up-to-date.

By way of suggested alternatives: if the btrfs bug tracker wasn’t such an unkempt graveyard, I’d be more receptive to the argument that tickets provide a reasonable stand-in for errata. But, as it stands, there are at least three separate bug trackers I’m aware of (and I am guessing SUSE has an internal one, which would make four); the kernel tracker alone has 296 open bugs, 97% of are still in NEW status. Many of them, like “Add option to check progress/status of btrfs-replace job”, are clearly obsolete. If no one is doing the minimum necessary work of project management to ensure bug trackers are reliable sources of information, they simply can’t be used as alternatives to documentation.

To look at this another way: Who is harmed by these notes? Do they suggest performing a dangerous operation? Do they enhance, or harm, the user’s understanding? Do they overwhelm other more important details, and if so, how can the same information be presented in a way that doesn’t do that, rather than just omitting unpleasant reality?

Given the inability to immediately find a code path that could be causing incorrect diagnostics from btrfs check and the false nature of many user reports of issues, I recognise the scepticism about my report that btrfs check did report errors on an array with only one corrupted mirror, which went away after btrfs scrub corrected the errors on the bad mirror. I don’t know what to say other than to fall back on my reputation as not being the kind of person who is generally incapable of telling whether a command is printing out errors or not. Trying to run btrfs check in lowmem mode even segfaulted! Surely that wouldn’t happen if the metadata it was reading was from the good mirror?

Let me know your thoughts, if you’d like.

Best,

Nov 05 '24 20:11 csnover

I think you miss one point, that an RW mount on a possible corrupted can cause problem by itself. E.g. for a corrupted extent tree/free space cache/free space tree, which breaks COW, just RW mounting with the extra update on the atime or whatever can already do damage further.

Thus I do not recommend RW scrub as a way to fix things, as it has the possibility to further corrupt the filesystem.

Except certain out-of-sync devices, the common corruption like mismatch transid can not really be repaired by using scrub. It's more possible some device are not honoring FLUSH/FUA correctly, and that will be a bigger problem by itself.

Furthermore if scrub can fix it, then there should be no observable problem except the dmesg error messages. As I have explained in the updated doc, kernel data/metadata read itself will do the repair, and we have several test cases to verify that behavior (including for scrub and for regular read).

So the idea of running RW scrub to self-heal is not any better than just running the filesystem as usual.

Nov 05 '24 21:11 adam900710

btrfs-progs btrfs-progs copied to clipboard

Improve documentation around userspace tool behaviour and safety

btrfs-progs
btrfs-progs copied to clipboard