Pine64-Arch icon indicating copy to clipboard operation
Pine64-Arch copied to clipboard

Frequent freezing and crashing with eMMC VCCQ mod

Open dariox86 opened this issue 2 years ago • 20 comments

  • Device: PinePhone
  • Kernel Version : 5.17.6-1-danctnix
  • UI: Phosh

Steps to reproduce

Perform eMMC VCCQ mod as described here and copy pinephone-vccq-mod.dtbo and user.scr to /boot.

Expected behavior

I expect the device to work with no stability issues.

Actual behavior

I performed the hardware modification in January 2022. Since then I have been experiencing frequent freezing and crashing as if the eMMC becomes unreadable and unwritable all of sudden during normal operation. Sometimes the device does not freeze as long as everything you need in that specific moment runs from RAM. As soon as I do something connected to internal storage, like launching a new application that is not already in the RAM, the device freezes. A sufficiently long eMMC I/O activity is enough to reproduce the issue. This is bound to happen about ten times a day on average during normal operation.

Logfiles and additional information

I don't know what log could be useful. Suggestions are welcome.

dariox86 avatar Jun 02 '22 19:06 dariox86

As this is a hardware mod, any problem occurred by the mod is outside of my support. But it's possible that the eMMC used in your device is not compatible with the mod.

You may want to reach out to dsimic in the PinePhone chat.

On Thu, 02 Jun 2022 12:22:58 -0700 Dario @.***> wrote:

  • Device: PinePhone
  • Kernel Version : 5.17.6-1-danctnix
  • UI: Phosh

Steps to reproduce

Perform eMMC VCCQ mod as described here and copy pinephone-vccq-mod.dtbo and user.scr to /boot.

Expected behavior

I expect the device to work with no stability issues.

Actual behavior

I performed the hardware modification in January 2022. Since then I have been experiencing frequent freezing and crashing as if the eMMC becomes unreadable and unwritable all of sudden during normal operation. Sometimes the device does not freeze as long as everything you need in that specific moment runs from RAM. As soon as I do something connected to internal storage, like launching a new application that is not already in the RAM, the device freezes. A sufficiently long eMMC I/O activity is enough to reproduce the issue. This is bound to happen about ten times a day on average during normal operation.

Logfiles and additional information

I don't know what log could be useful. Suggestions are welcome.

-- Reply to this email directly or view it on GitHub: https://github.com/dreemurrs-embedded/Pine64-Arch/issues/404 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

Danct12 avatar Jun 03 '22 05:06 Danct12

Update: I tried to pinpoint the issue and I concluded that this is something specific to Arch Linux ARM DanctNIX. I connected a USB stick to the PinePhone through the hub and launched archlinux-pinephone-phosh-20220502 from a microSD. From the live system I launched a copy operation of my /home/alarm in the eMMC to the USB stick. Eventually the copy operation would freeze before finishing the copy operation. I tried it four times just to be sure.

Then I tried doing the same with 20220601-0442-postmarketOS-v21.12-phosh-17-pine64-pinephone. It took literally hours but in the end it worked on first try.

Of course, in both cases I had to copy pinephone-vccq-mod.dtbo and user.scr to the boot partition.

dariox86 avatar Jun 03 '22 09:06 dariox86

Does the whole system hangs when copy operating hangs? Can you please post dmesg?

Danct12 avatar Jun 03 '22 11:06 Danct12

When running from microSD I can still manage to get hold of the system by killing the copy process. When the same problem occurs when running from eMMC, the system operates erratically. It may or may not freeze, though even when the system is not completely frozen I can not launch any new application or load a file because it is unable to communicate with eMMC. I will try again and post my dmesg for you to check.

dariox86 avatar Jun 03 '22 12:06 dariox86

I launched the copy command and when the copy froze I dumped dmesg output. The only relevant lines I see are:

[  710.656872] sunxi-mmc 1c11000.mmc: data error, sending stop command
[  711.661903] sunxi-mmc 1c11000.mmc: send stop command failed

Attached full dmesg log dmesg.txt .

dariox86 avatar Jun 03 '22 13:06 dariox86

That looks like the eMMC driver tried to read/write some data, but failed.

Are you sure the VCCQ patch files are installed? Non-VCCQ images do not work properly on a modded device.

Danct12 avatar Jun 21 '22 02:06 Danct12

pinephone-vccq-mod.dtbo and user.scr are in the boot partition. It would not boot without these files in place.

dariox86 avatar Jun 21 '22 07:06 dariox86

@dariox86 Any luck lately with this issue?

bfra2373 avatar Feb 02 '23 01:02 bfra2373

A while ago something changed. It does not freeze anymore as long as the screen is turned on. It only happens when the device is idle and the screen is turned off. Turning off the screen with the power button even for a second can be sufficient to trigger the issue. If I am unlucky, it can happen up to four times in a row. If I am lucky, it will stay at rest for a night and it will still be operating at morning. It is very random. On average, it happens a dozen times in the span of a day. It seems less frequent when the device is plugged to the power via USB. On a side note, I have been experiencing a bunch of unrelated regressions. I did not have the time to pinpoint the respective causes.

dariox86 avatar Feb 02 '23 02:02 dariox86

When a new release is out I may try to reinstall everything from scratch.

dariox86 avatar Feb 02 '23 02:02 dariox86

Does it run fine on Mobian or pmOS?

bfra2373 avatar Feb 02 '23 15:02 bfra2373

Back then Arch Linux ARM DanctNIX ran fine when booted from a microSD. I could reproduce the problem by issuing a long copy command from the eMMC. Eventually the eMMC would not respond. Doing the same from postmarketOS did not cause problems. At the moment I can not reinstall another operating system on the eMMC because I use my device as my daily driver. I would need to copy a whole bunch of data out and back on the device.

dariox86 avatar Feb 02 '23 16:02 dariox86

I understand! I had a bit of the same issue with data management but know I use syncthing to sync /home to my home computer. So no more headache when I need to replace the OS!

bfra2373 avatar Feb 02 '23 18:02 bfra2373

I do the same with my computer, I can afford to lose everything at any time, but I have yet to set up a similar feature for my smartphone.

dariox86 avatar Feb 02 '23 19:02 dariox86

Tried again with Arch Linux ARM DanctNIX installed from scratch from the latest release 2023/02/03: same issue all along, turning the screen off even for a split second is enough to randomly trigger the issue. Then I tried with postmarketOS: went smooth for the last twenty-four hours.

dariox86 avatar Feb 10 '23 08:02 dariox86

Tried again with Arch Linux ARM DanctNIX installed from scratch from the latest release 2023/02/03: same issue all along, turning the screen off even for a split second is enough to randomly trigger the issue. Then I tried with postmarketOS: went smooth for the last twenty-four hours.

I read that some hardware sub models do not support this mod, what revision do you have? I would like to do this on my device but, mine is a ubports (v1.2) so I want to be a sure as possible that I am not bricking anything.

sorry-i-am-late avatar Nov 08 '23 16:11 sorry-i-am-late

If I am not mistaken, I had version 1.2. It is the one that came with Manjaro preinstalled.

dariox86 avatar Nov 11 '23 12:11 dariox86

  1. Different batches may have different eMMC chips installed. Could you post information about yours for statistics.

This one is on my 1.2a (3/32) revision:

root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/oemid
0x0100
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/manfid 
0x000045
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/date
06/2020
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/fwrev
0x3034313430363139
root@mobian-dev:/sys/block/mmcblk2/device# cat /sys/block/mmcblk2/device/hwrev
0x0
  1. Could you take kernel logs from serial (UART) console, when the devices hangs? I've moded my PP and have freezes once in a several days on Mobian with 6.1. The last one was related to eMMC.

After couple of minutes I've got following messages on serial console:

[293876.407089] INFO: task systemd-journal:305 blocked for more than 241 seconds.
[293876.415187]       Tainted: G         C  E      6.1-sunxi64 #1
[293876.422697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[293876.432247] task:systemd-journal state:D stack:0     pid:305   ppid:1      flags:0x0000080c
[293876.441568] Call trace:
[293876.445748]  __switch_to+0xc0/0x130
[293876.450013]  __schedule+0x388/0x994
[293876.454458]  schedule+0x54/0xdc
[293876.458566]  io_schedule+0x40/0x60
[293876.463711]  bit_wait_io+0x1c/0x70
[293876.468078]  __wait_on_bit+0x78/0xcc
[293876.472584]  out_of_line_wait_on_bit+0x8c/0xb4
[293876.478772]  __wait_on_buffer+0x3c/0x50
[293876.483552]  ext4_read_bh+0xd8/0xf0 [ext4]
[293876.488961]  ext4_read_bh_lock+0x5c/0xa0 [ext4]
[293876.495315]  ext4_bread+0x78/0xb0 [ext4]
[293876.500260]  __ext4_read_dirblock+0x5c/0x3c0 [ext4]
[293876.506904]  ext4_dx_find_entry+0x11c/0x1e4 [ext4]
[293876.512713]  __ext4_find_entry+0x3c4/0x410 [ext4]
[293876.519083]  ext4_lookup+0x1ac/0x2a0 [ext4]
[293876.524249]  __lookup_hash+0x80/0xd0
[293876.528864]  do_renameat2+0x264/0x49c
[293876.534280]  __arm64_sys_renameat+0x5c/0x70
[293876.539442]  invoke_syscall+0x4c/0x110
[293876.544188]  el0_svc_common.constprop.0+0xc8/0xf0
[293876.550498]  do_el0_svc+0x30/0xb0
[293876.554776]  el0_svc+0x14/0x4c
[293876.558825]  el0t_64_sync_handler+0x10c/0x120
[293876.564881]  el0t_64_sync+0x14c/0x150
[293876.575193] INFO: task kworker/1:2H:209260 blocked for more than 241 seconds.
[293876.583958]       Tainted: G         C  E      6.1-sunxi64 #1
[293876.590641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[293876.600359] task:kworker/1:2H    state:D stack:0     pid:209260 ppid:2      flags:0x00000008
[293876.610372] Workqueue: kblockd blk_mq_run_work_fn
[293876.616147] Call trace:
[293876.619553]  __switch_to+0xc0/0x130
[293876.624784]  __schedule+0x388/0x994
[293876.628834]  schedule+0x54/0xdc
[293876.632890]  schedule_timeout+0x14c/0x180
[293876.637856]  __wait_for_common+0xe4/0x234
[293876.642816]  wait_for_completion+0x24/0x2c
[293876.647895]  mmc_wait_for_req_done+0x30/0xf4
[293876.653935]  mmc_wait_for_req+0xac/0xfc
[293876.658613]  mmc_wait_for_cmd+0x6c/0xb0
[293876.663387]  __mmc_send_status+0x7c/0xc0
[293876.669047]  mmc_blk_mq_rw_recovery+0x5c/0x3d0
[293876.674466]  mmc_blk_mq_poll_completion+0x7c/0x210
[293876.680249]  mmc_blk_rw_wait+0x11c/0x210
[293876.685898]  mmc_blk_mq_issue_rq+0x26c/0x8e0
[293876.691126]  mmc_mq_queue_rq+0x150/0x320
[293876.696786]  blk_mq_dispatch_rq_list+0x1b8/0x960
[293876.702361]  blk_mq_do_dispatch_sched+0x2e0/0x360
[293876.708021]  __blk_mq_sched_dispatch_requests+0x128/0x180
[293876.715135]  blk_mq_sched_dispatch_requests+0x3c/0x7c
[293876.721143]  __blk_mq_run_hw_queue+0x7c/0xb0
[293876.727165]  blk_mq_run_work_fn+0x24/0x2c
[293876.731955]  process_one_work+0x1e4/0x440
[293876.736825]  worker_thread+0x180/0x4a0
[293876.742163]  kthread+0xd8/0xe0
[293876.746202]  ret_from_fork+0x10/0x20

Make sure your kernel has following options enabled. CONFIG_DETECT_HUNG_TASK=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120 To check use zcat /proc/config.gz

It would be interesting to see whether your backtrace is the same (or similar) as mine.

AndreySV avatar Nov 12 '23 05:11 AndreySV

  1. Different batches may have different eMMC chips installed. Could you post information about yours for statistics.

This one is on my 1.2a (3/32) revision:

Mine is the ubports (2/16) my emmc is the same as shown in the original write up for this mod. I should have time to mess around with it more either tomorrow or Tuesday.

sorry-i-am-late avatar Nov 12 '23 05:11 sorry-i-am-late