backdown icon indicating copy to clipboard operation
backdown copied to clipboard

Crash: "stream did not contain valid UTF-8" (filenames that should be invalid but still occur?)

Open D-side opened this issue 9 months ago • 1 comments

For giggles I ran backdown on one pile of files of mine abandoned for about a decade, and it failed to handle something in there.

Last I/O before disaster struck:

Staging Question 3/1886
The {snip} directory contains 27 files which are all present elsewhere.
You can remove the whole directory without losing anything.
This would let you gain 24M.
What do you want to do with this directory?
[r] Stage the whole directory for removal
[s] Skip and go to next question
[e] End staging phase
r
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: InvalidData, message: "stream did not contain valid UTF-8" }', {snip}/backdown-1.1.1/src/ask.rs:248:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(I {snip}ped unimportant paths) The error points here: https://github.com/Canop/backdown/blob/v1.1.1/src/ask.rs#L248

This is backdown 1.1.1, and best I can tell 1.1.2 didn't fix anything relevant to this, so I didn't try upgrading anything prior to posting.

The conditions I'm running it in are admittedly kinda extreme: Ubuntu > WSL2 > drvfs > Windows network drive > CIFS network file share. I'm not discounting the possibility that filenames got corrupted somewhere along the way. But considering "old and abandoned" files would seem to be a likely target for backdown, maybe it should be more resilient to anomalies such as this? It did manage a few other piles of files through this same cursed chain of bridges and did get me a few extra gigabytes of space, so I'm more inclined to think I genuinely might have a couple files named in invalid UTF-8. The question is, where… Maybe earlier staged subdirectories will point me at the culprit.

What I find curious is that analysis phase ran just fine, and the problem happened only when printing one of the staging questions. Which hopefully limits the scope of where the problem could be to something manageable.

Unfortunately I do not yet have this isolated to a minimal case to easily reproduce, as analysis of this pile takes about an hour. I'll try the subfolders where it likely happened and if I find anything, I'll add a note. I'm posting this in its current state on the off-chance it's enough to produce a synthetic case to trigger the same problem.

And despite this setback, it's still a great tool that has saved me a good chunk of time already, thank you!

D-side avatar Mar 08 '25 14:03 D-side