Multiple Steam Deck users are reporting random file-sytem corruption with SteamOS 3.6 Beta
Your system information
- Steam client version: 1722380543
- SteamOS version: 3.6.9
- Opted into Steam client beta?: Yes
- Opted into SteamOS beta?: Yes
- Have you checked for updates in Settings > System?: Yes
Please describe your issue in as much detail as possible:
When using the current SteamOS 3.6 Beta, file-system corruption has been observed by multiple Steam Deck users over the span of multiple months, which makes SteamOS unbootable and thus unusable.
The unfortunate nature of this very severe bug is that it seemingly occurs randomly without any clear pattern to trigger it, thus making it practically impossible to reliably reproduce.
Having used Linux extensively for more than a decade myself, I have a gut-feeling that the Linux kernel Valve chose for SteamOS 3.6, namely Linux 6.5, most likely contains a very subtle bug inside its very complex file-systems code area.
IMHO, the choice of using Linux 6.5 for SteamOS 3.6 is a poor one by Valve, because Linux 6.5 is already considered obsolete by upstream kernel developers, whereas Linux 6.6 is an LTS release and therefore still supported and actively used in production systems.
On the other hand, I'm aware that Valve is already actively preparing to use Linux 6.8 within SteamOS, so perhaps switching from Linux 6.5 to 6.8 for the current SteamOS 3.6 Beta would already resolve this very serious file-system corruption bug.
Hope Valve at least considers doing so, thanks!
Dont have much to add except that i have been affected by this three times personally in less than two weeks - happens nearly every reboot for me. FSCK fixes the home partition thankfully, but this is inexcusable.
Taken from deathblade user comment on reddit:
yeah so did I many times while testing 3.6 and 3.7 to the point I eventually had to go back to 3.5 which hasnt had the issue once. the most common thing it does from what I saw is it either deletes the whole .config folder or it just dumps the whole deck folder into lost and found making the data basically unrecoverable
If you run into this issue, please go to Settings->System and create a system report so we can look at the logs to diagnose what happened and work towards fixing the issue.
After creating the report you can save it to desktop and paste it here, or submit it to steam support and you can send me your steam username and I can retrieve it.
One extra note: a system report includes logs from the current boot and the previous boot. This might not cover the session where things actually went wrong. If you have more than one boot in between these two sessions you can collect all the previous logs by running this command in a desktop mode console:
journalctl > ~/Desktop/journal.log
If you'd rather not paste your steam userid here, feel free to open a steam support ticket and send me the ticket id and I can find you via that method. Also, if you have a deck stuck in this state and you are having trouble recovering it, feel free to contact me via steam support as well.
@lostgoat
Would it be possible for Valve to push an experimental branch of SteamOS 3.6 running on the Linux 6.8 kernel?
This way, those users affected by this file-system corruption bug on Linux 6.5 could easily verify whether it also occurs on Linux 6.8 or not.
The alternative route is Valve trying to fix an obscure file-system bug on an obsolete Linux kernel revision (good luck with that, BTW).
Your choice...
@FanOfABT thanks for bringing this issue to our attention, but comments like that don't help us get towards a resolution to the problem.
The same happened to me like 4 times in 2 weeks, i had to reimage/format steam os from zero everytime, i tried to repair with fsck booting from an ubuntu live usb, and it booted but after a while i couldnt open any program even firefox, i was scared that it was something related with hardware, but since i changed to stable i havent had any issue at all.
Replying to https://github.com/ValveSoftware/SteamOS/issues/1591#issuecomment-2278457378
this is not possible because when the error happens you cant even boot, it starts on emergency mode loop, i would like to help but right now im scared to be on beta again, i have spent like 2 days trying to resolve this issue, and dont want to install everything again.
@lostgoat
Today I went to the beta channel just to collect logs for you.
What happened is that I waited foe the issue to happen (First system reboot actually) , after the system went into emergency mode, I fired up the recovery image and copied every var/log folder that i could find.
log2.zip The other log file. Please also check reddit, this is affecting so many people that it would create a huge issue if released to stable and will require above average skills to solve and might flood the RMA for steam support.
I have also had this issue multiple times already and I misinterpreted it as a failing SSD, so I RMAed. This is a huge issue, especially for people with less technical expertise. Running FSCK doesn't always fix it either, it seems like the only good way to fix it is by Reimaging the Steam Deck which really sucks. Reinstalling SteamOS doesn't fix it either.
I just switched my Deck into the beta branch, and within days my filesystem was completely corrupt as FanOfABT posted. Trying to switch back to the previous "B" SteamOS install by holding the "..." button during power up did not work either.
I was unable to recover the file system and had to reimage the deck.
What happened is that I waited foe the issue to happen (First system reboot actually) , after the system went into emergency mode, I fired up the recovery image and copied every var/log folder that i could find.
Thanks, looking into it.
I wonder if these fs corruption events happen when rebooting the deck? or maybe when suspending/resume the deck ?
Could anyone who has had this problem let us know exactly what they have?
steamos-systemreport > sysrep.txt
Will tell you (and a bunch of other potentially useful info).
Also the following:
- Is your deck in readwrite mode or dev mode?
- Have you applied any tweaks or settings changes?
Just so we can make sure there are no bad interactions. (None of the tweaks we are aware of should be able to cause these problems, but it's always better to check).
@fledermaus
Currently unable to access my Steam Deck, so I can't tell you the exact storage model yet, but it happened on my 256 GB LCD model twice.
First time was with all of A.B.T.'s SteamOS tweaks applied, however the second time it happened with a stock SteamOS 3.6 Beta installation.
Therefore I concur that applying some software tweaks to SteamOS is certainly not the culprit here.
My bet still stands that the most likely culprit is a very subtle file-system bug within the outdated & obsolete Linux 6.5 kernel revision Valve insists on using for SteamOS 3.6, for very mysterious & unknown reasons...
Replying to https://github.com/ValveSoftware/SteamOS/issues/1591#issuecomment-2284194431
There were plans to upgrade to the Linux 6.10 kernel, but the expected merges didn't take place.
Replying to https://github.com/ValveSoftware/SteamOS/issues/1591#issuecomment-2284134260
I have a Stock 64GB LCD model with a 256GB SD card. SteamOS was untweaked, although I did frequently use Desktop mode and had some applications installed through Discover.
Just want to re-iterate.
If you've ever experienced this issue, please submit a system report and let us know your steam username. This is useful even if you aren't experiencing the problem right now.
If you are experiencing the problem, run fsck to restore the unit or restore the files in lost+found. Then submit a system report so that we can see what happened. If you are unfamiliar with how to do the above, open a support ticket and we can try a couple of more things to get you up and running again while preserving the error logs.
I've also seen this issue being linked in a couple of unrelated places online, so I wanted to clarify what are the actual symptoms for this failure:
- The system will fail to boot
- In the recovery menu, choosing Current SteamOS or Previous SteamOS both fail to boot
- This is an important detail, as many of the other cases I've seen were fixed by just choosing Previous SteamOS
- Critical system files can be found under the lost+found folder. Restoring them fixes the boot problems.
Can someone that has experienced this issue confirm if the above is correct?
@lostgoat @fledermaus sysrep.txt Went back to 3.5 to be able to boot. I hope that my previous full logs and this system report is useful. due to facing this issue more than 10 times, I faces it on vanilla and modified setups. Current or previous from recovery menu both fails, interestingly enough for me using the erase user data solved this for me at least a couple of times lost+found restored from 3.5 setup doesn't solve the issue
Just want to re-iterate.
If you've ever experienced this issue, please submit a system report and let us know your steam username. This is useful even if you aren't experiencing the problem right now.
If you are experiencing the problem, run fsck to restore the unit or restore the files in lost+found. Then submit a system report so that we can see what happened. If you are unfamiliar with how to do the above, open a support ticket and we can try a couple of more things to get you up and running again while preserving the error logs.
I just submitted a system report, but I don't know how much use it is. Fsck was not able to unfsck my filesystem. It was only fixed with a complete reimage, so potential useful log files were gone.
The only additional thing I can add was that my Steamdeck had separate OS and Steam Client beta-participation settings enabled. I had been running the OS Update Channel in "Beta", and the "Steam Client Update Channel" in "Steam Deck Stable".
Starting with the SteamOS 3.6.9 Beta, I noticed my Steam Deck started behaving odd. Every single cold boot would result in the "Verifying Installation" appearing before eventually making it to the Steam OS Home. It otherwise worked fine, however. I then decided to switch the "Steam Client Update Channel" into Steam Deck Beta. It seemed to install the Beta, but the Steamdeck then hung when it tried to restart the Steam Client.
I had to hard power off the Steamdeck. Upon powering back up, it would never make it past the Steam logo. I then pushed the "..." button and power at the same time, and selected the current OS to get verbose output. The output text said something about the system being in Emergency mode, and "press enter to continue". I plugged in a keyboard, hit enter, and it automatically attempted to run fsck. It got to about 25%, failed, and then said "press enter to continue", and the same thing happened each time.
"Verifying installation" can happen, iirc, if there was an abrupt shutdown or steam didn't get a chance to clean up properly. That part doesn't necessarily indicate a deeper problem. The rest is quite weird though. More data for the investigation. Thanks.
A.B.T.'s SteamOS tweaks
I immediately wonder if there is correlation between people having these issues and those tweaks.
Also it should be noted that Decky had an issue with a specific plugin that they resolved.
@Intoxicus
Is it really too much to ask to read a thread fully before diving in head-first?
As I had already stated, the file-system corruption bug happened on a stock SteamOS 3.6 Beta installation.
So no, neither Decky nor A.B.T.'s SteamOS tweaks are the culprit here.
@Intoxicus
This happens in stock installation also.
I hope we get updates on this soon.
Replying to https://github.com/ValveSoftware/SteamOS/issues/1591#issuecomment-2289869311
What the ABT Tweaks do would persist unless there's a full wipe/reset/reinstall as I understand it.
Have you ever used ABT Tweaks at all?
The Decky Boot Loop thing is interesting in how the timing of it correlates with when this bug appeared. It could be related in the sense that the same bug can caused different issues in different circumstances.
Unless a BNiB Deck has presented with this bug it is valid to consider if ABT and/or Decky is a possible cause. And/or a factor, but not the cause in of itself.
I can say I have not had these issues at all. I use Decky, but did not get hit by that recent Boot Loop issue that happened. I think it occured on the same update where this bug presented.
I do not use ABT, and would not, except for that one memory lock tweak. I want to fact check that one before applying it independently. The idea of permanently being in performance mode seems silly to me. I'll just use PowerTools from Decky to do it for games that actually need it. The thought process behind those tweaks makes me apprehensive that they're not well thought out and short sighted. I don't even know who "ABT" is. I didn't even hear of these ABT Tweaks until reading this bug report thread. Right away it strikes me as sus in terms of "does doing this actually make sense."
I've done all sorts of troubleshooting and one thing I've learned is to always keep a beginner's mind and recheck your assumptions. If you're sure that can't be possible, double check anyway(within reason.) I've been surprised where the thing I thought it could not possibly be actually was the solution. I'll double check things that seem silly just to be absolutely certain(within reason.)
Valve Devs are going to have a clearer and more complete data set and idea of what is actually going on. They're likely 23 steps ahead of us mere mortal towels. ;)
I have an LE OLED that has neither the boot loop issue with decky, or this config wipe bug. I've not modded anything. I only use Decky and Cryobytes, nothing fancy or crazy. Even with Decky I keep it minimal. There's some Decky plugins that I just would not bother with for various reasons.
Being able to compare a unit that has never had the issue with the logs of affected units could be helpful. Valve Devs let me know if you want "clean" logs from an unaffected Deck "in the wild."
I've hidden a couple of topics to avoid de-railing the discussion. For now some users have reported that they see the problem with no modifications so we are treating it that way.
So far we've been stress testing the filesystem and trying different ways to hard reset the unit to trigger an unclean filesystem exit and we haven't been able to repro.
If anyone has their hands on a unit that is in this failure mode please contact us as there could be a lot of useful information in that unit.
Replying to https://github.com/ValveSoftware/SteamOS/issues/1591#issuecomment-2290176167
I had previously provided logs from a failed system through recovery image, any good come from those?
For now I will use only stable as I'm travelling and beed my steam deck, but for the future, what will be needed from a failed systems other than the logs provided? could chroot to the system and provide what is needed.
Also, I would like to know if the main channle has a bewer kernel than beta, could test if this also happens on main or not
Right now I think the latest main image has only a couple of display blanking/refresh rate related patches over what's in beta.
Replying to #1591 (comment)
I had previously provided logs from a failed system through recovery image, any good come from those?
Some, but not yet enough. We can see it happening, as well as a few messages which migt be leads or might be unrelated - we're still chasing those down.
We cannot yet tell why it happens and our stress testing hasn't triggered the problem here (yet).