noobs icon indicating copy to clipboard operation
noobs copied to clipboard

Sdcard stuck trying to write the image

Open carlonluca opened this issue 7 years ago • 23 comments

Hello! I'm using noobs for a project. It seems that sometimes the write procedure of the image blocks suddenly. When it is stuck, the system is up and running, I can login via ssh (I added it to buildroot) and the recovery application works and responds properly. The writing thread of the recovery app instead is stuck (the one that wgets and untars). In that situation I logged in using ssh and I found that trying to write any data to the sdcard results in the deadlock of the process. Tried to simply dd 1MB into the sdcard and dd couldn't finish. Only solution is to reboot. After a reboot everything is back to normal and the operation typically completes. The system will properly work from then on. This happens from time to time with many devices and sdcards. dmesg doesn't show any error from the kernel. Any idea what may be causing this? Anyone who got this behaviour before? Thanks!

carlonluca avatar Apr 06 '18 10:04 carlonluca

I have also seen this occasional behaviour from my PINN variant, but as there are no error messages, it is not easy to see what has gone wrong. What models of RPi have you seen this happen on? What version of NOOBS did it happen on? How were you connected to the internet - Ethernet or wifi / built-in or external (which)?

procount avatar Apr 06 '18 10:04 procount

I'm using noobs only on Pi3. I've been using 9a4547c, but I also tried latest master, where I see kernel and firmware files were updated: I can reproduce the same behaviour. After the image is written, the image itself works perfectly fine. Never had problems writing to the sdcard. I download from LAN using the regular ethernet interface. It does not seem to be a network/server issue as everything seems to be working via ssh, except writing to the sdcard.

carlonluca avatar Apr 06 '18 10:04 carlonluca

Hmm, I've not seen it happen on v2.4 (or PINN equivalent), or earlier versions. I hope @XECDesign can come up with some ideas on how to debug this to identify where the failure comes from - Ethernet, wget, xz, bsdtar, SDcard driver? Which OS caused it to stick? (Just wondering if the type of download/tar/compression affects it)

procount avatar Apr 06 '18 10:04 procount

The image I write is a custom image based on raspbian. I use xz compression. I tried to add:

CONFIG_STACKTRACE_SUPPORT=y
CONFIG_STACKTRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y

but still I cannot get any log.

carlonluca avatar Apr 06 '18 10:04 carlonluca

Probably because it has not actually crashed, but just got stuck somewhere... 🤷‍♂️

procount avatar Apr 06 '18 10:04 procount

Also tried with CONFIG_DETECT_HUNG_TASK. I remember the kernel should be able to also print the stacktrace in case something hangs, but not sure if that is optional and if that is properly enabled by these directives...

I don't remember ever seeing anything similar in raspbian, so I guess it is probably useless to ask in https://github.com/raspberrypi/linux right? Thanks for your help.

carlonluca avatar Apr 06 '18 11:04 carlonluca

Might be worth investigating if it only happens with images that are wget-ed and extracted (i.e the way that NOOBS Lite installs Raspbian), or also happens with images extracted directly from the SD card (i.e. the way that full NOOBS installs Raspbian) ?

lurch avatar Apr 06 '18 11:04 lurch

IIRC, I've only seen it when downloading, but didn't take note whether it was ethernet or wifi.

procount avatar Apr 06 '18 11:04 procount

Ah. It happened tonight in PINN v2.5.4 when installing Retropie on a 3B+ from a USB stick. Normally ctrl-alt-f2 followed by ctrl-alt-del would reboot it, but not in this stuck state

procount avatar Apr 06 '18 20:04 procount

I tried to "reboot -f" once and it didn't work. But the watchdog seemed to be able to do it instead the other day. Anyone who knows what could be enabled in the kernel to get log messages to debug?

Were you using xz compression by any chance? I typically use xz (-9) and it happens frequently. xz -9 requires much mem so I then tried with gz, took me some time but at the end I could reproduce with it as well. Difficult to say if it makes any difference or not.

carlonluca avatar Apr 07 '18 14:04 carlonluca

My retropie image was compressed with xz but with standard compression not -9 cos it uses too much memory.

procount avatar Apr 07 '18 17:04 procount

This is how mem is seen during the procedure with gz:

# free -m
             total         used         free       shared      buffers
Mem:           231          224            6            0           13
-/+ buffers:                210           20
Swap:            0            0            0
# cat /proc/meminfo 
MemTotal:         236676 kB
MemFree:           16740 kB
MemAvailable:     178028 kB
Buffers:           21788 kB
Cached:           128740 kB
SwapCached:            0 kB
Active:            45064 kB
Inactive:         112384 kB
Active(anon):       7032 kB
Inactive(anon):      220 kB
Active(file):      38032 kB
Inactive(file):   112164 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              9164 kB
Writeback:          5116 kB
AnonPages:          6912 kB
Mapped:            15676 kB
Shmem:               336 kB
Slab:              25380 kB
SReclaimable:      18196 kB
SUnreclaim:         7184 kB
KernelStack:         872 kB
PageTables:          352 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      118336 kB
Committed_AS:      44396 kB
VmallocTotal:    1835008 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:           8192 kB
CmaFree:            3912 kB

I see that only 231MB of memory is available. Is this written somewhere? This is a pi3 so 231MB does not seem a correct value, does it? Is this written somewhere in the sources?

carlonluca avatar Apr 09 '18 07:04 carlonluca

I see that only 231MB of memory is available. Is this written somewhere?

Put start.elf/fixup.dat on the SD card instead of recovery.elf if you need access to more.

maxnet avatar Apr 09 '18 10:04 maxnet

Thank you for your answer. I read in the wiki:

Running recovery.elf then switches the firmware into "NOOBS mode" - it uses recovery.img instead of kernel.img, recovery.cmdline instead of cmdline.txt, and it sets the root filesystem to recovery.rfs.

So does it mean I cannot use start.elf/fixup.dat with noobs?

carlonluca avatar Apr 09 '18 10:04 carlonluca

I can't remember the details now, but IIRC you can tweak some settings in config.txt to get it to read the NOOBS-named files.

lurch avatar Apr 09 '18 11:04 lurch

Ah thanks, so I should:

  1. remove recovery.elf;
  2. put start.elf and fixup.dat in boot;
  3. in config.txt set cmdline=recovery.cmdline and kernel=recovery.kernel.

Is this correct? What I'm missing according to the wiki is how to set rootfs to recovery.rfs. Also can I extract start.elf and fixup.dat from any Raspbian image? I guess increasing ram won't change anything, but I'm not sure what else I could try.

carlonluca avatar Apr 09 '18 12:04 carlonluca

IIRC recovery.rfs is an initramfs, if that helps. You can also get the files you need from https://github.com/raspberrypi/firmware/tree/master/boot

lurch avatar Apr 09 '18 16:04 lurch

I tried with this in config.txt but I'm getting a kernel panic (cannot mount root fs):

cmdline=recovery.cmdline
kernel=recovery7.img
initramfs=recovery.rfs

carlonluca avatar Apr 09 '18 16:04 carlonluca

It's a long long time since I played with any of this, but I think @procount might have more recent experience?

lurch avatar Apr 09 '18 16:04 lurch

Potentially useful things to try to get more info:

You may learn something by enabling the driver's logging feature which will record activity the kernel message log. Add dtparam=sd_debug=on to config.txt and reboot. You can also eliminate a DMA problem as being the cause (at the cost of some performance) by adding dtparam=sd_force_pio=on.

https://github.com/raspberrypi/linux/issues/2500#issuecomment-381901814

XECDesign avatar Apr 17 '18 09:04 XECDesign

I turned on the logging in PINN but kept DMA, I was writing Raspbian from USB to the SD card on a Pi3B using Linux recovery 4.14.37-rescue-v7, but using the OLDish firmware (31st March). (EDIT: I suppose it was 22de0bb68d34fd210ba9d086c6a1fc5e90f0bfbb)

tail - f /tmp/debug

Executing: "/sbin/mkfs.fat -n prjboot -F 32 /dev/mmcblk0p6" 
Executing: "sh -o pipefail -c "xz -dc /tmp/media/sda1/os/Raspbian/boot.tar.xz | bsdtar -xf - -C /mnt2  --no-same-owner "" 
finished writing filesystem in 1.643 seconds 
Executing: "/usr/sbin/mkfs.ext4 -L prjroot -O ^huge_file /dev/mmcblk0p7" 
Executing: "sh -o pipefail -c "xz -dc /tmp/media/sda1/os/Raspbian/root.tar.xz | bsdtar -xf - -C /mnt2 "" 

tail - f /tmp/messages

Jan  1 00:01:22 recovery kern.info kernel: [   82.882741] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:22 recovery kern.info kernel: [   82.882790] mmc0: cmd 25 0xc3ca50 (flags 0xb5) - write 760*512
Jan  1 00:01:22 recovery kern.info kernel: [   82.917723] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:22 recovery kern.info kernel: [   82.917771] mmc0: cmd 25 0xc3cd48 (flags 0xb5) - write 648*512
Jan  1 00:01:22 recovery kern.info kernel: [   82.948890] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:22 recovery kern.info kernel: [   82.948926] mmc0: cmd 25 0xc3cfd0 (flags 0xb5) - write 816*512
Jan  1 00:01:23 recovery kern.info kernel: [   82.999782] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:23 recovery kern.info kernel: [   82.999834] mmc0: cmd 25 0xc3d300 (flags 0xb5) - write 1024*512
Jan  1 00:01:23 recovery kern.info kernel: [   83.043922] mmc0: cmd 13 0xaaaa0000 (flags 0x195)
Jan  1 00:01:23 recovery kern.info kernel: [   83.043959] mmc0: cmd 25 0xc3d700 (flags 0xb5) - write 1024*512

Nothing unusual in the logs :( The slides were still changing, the language and keyboard dialog was still responsive, as was ssh. Just the Imagewritethread seemed to have stopped.

procount avatar May 24 '18 13:05 procount

I experienced exactly the same behavior. Nothing unusual in the logs related to the sdcard. I kept DMA as well.

carlonluca avatar May 31 '18 22:05 carlonluca

Also on Pi 3 B+ the same is happening. Everything seems to be working properly but it seems the thread writing to the sd card is stuck.

carlonluca avatar Jun 12 '18 15:06 carlonluca