os-issue-tracker icon indicating copy to clipboard operation
os-issue-tracker copied to clipboard

[email protected] crash on Pixel 8 Pro when switching to telephoto lens

Open chenxiaolong opened this issue 1 year ago • 2 comments

When switching to the telephoto lens on my Pixel 8 Pro, the [email protected] service crashes after a few seconds. On my device, this is easily reproducible in both GrapheneOS Camera and Google Camera by using the max zoom level and then focusing on a bright constrasty object that convinces the device to use the telephoto sensor (instead of cropping the main sensor).

I'm able to reproduce the issue in GrapheneOS 2024050900, but not with stock Pixel OS (May update). I haven't investigated, but at first glance, the symptoms look like memory corruption. With the GrapheneOS Camera, the viewfinder feed is sometimes visibility corrupted right before the crash. In the logcat, the process prints out a memory dump that contains many instances of 0xfa11fa11, which looks suspiciously like a canary value for detecting memory corruption.


When using Google Camera, this is the crash notification. When the camera HAL disconnects, the app exits.

type: crash
osVersion: google/husky/husky:14/AP1A.240505.005/2024050900:user/release-keys
uid: 1000 (u:r:hal_camera_default:s0)
cmdline: /apex/com.google.pixel.camera.hal/bin/hw/[email protected]
processUptime: 0s

abortMessage: Some nodes appear to be stuck, abort. Most likely some fence was not signaled or a hardware block got stuck, see previous errors for diagnostics

signal: 6 (SIGABRT), code -1 (SI_QUEUE)
threadName: WatchDog
MTE: enabled

backtrace:
    /apex/com.android.runtime/lib64/bionic/libc.so (abort+168, pc 680f8)
    /system/lib64/liblog.so (__android_log_default_aborter+16, pc 6350)
    /apex/com.android.vndk.v35/lib64/libbase.so (android::base::LogMessage::~LogMessage()+356, pc 1a234)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc f900ac)
    /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204, pc d5e6c)
    /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68, pc 69a64)

When using GrapheneOS Camera, this is the crash notification. When the camera HAL disconnects, the app seems to reconnect and recover (but then the camera provider crashes again as expected).

type: crash
osVersion: google/husky/husky:14/AP1A.240505.005/2024050900:user/release-keys
uid: 1000 (u:r:hal_camera_default:s0)
cmdline: /apex/com.google.pixel.camera.hal/bin/hw/[email protected]
processUptime: 0s

abortMessage: [Hang Recovery] timeout wait for front-end recovery ready. CamId : [CAM 4] , frame: 29

signal: 6 (SIGABRT), code -1 (SI_QUEUE)
threadName: RunnerH:P+:A:6
MTE: enabled

backtrace:
    /apex/com.android.runtime/lib64/bionic/libc.so (abort+168, pc 680f8)
    /system/lib64/liblog.so (__android_log_default_aborter+16, pc 6350)
    /apex/com.android.vndk.v35/lib64/libbase.so (android::base::LogMessage::~LogMessage()+356, pc 1a234)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 9ea6b0)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc d74d30)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc d73774)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc fb1280)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc fd8ea4)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc fda7a8)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc f8c160)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 8b94d0)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 477f88)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc f8de28)
    /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204, pc d5e6c)
    /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68, pc 69a64)

Logcat from one instance of the crash: logcat.txt

chenxiaolong avatar May 16 '24 02:05 chenxiaolong

I was able to work around this (insecurely) by re-disabling MTE for vendor processes in bionic:

diff --git a/libc/bionic/libc_init_static.cpp b/libc/bionic/libc_init_static.cpp
index 34cbad483..a7e0cf482 100644
--- a/libc/bionic/libc_init_static.cpp
+++ b/libc/bionic/libc_init_static.cpp
@@ -266,7 +266,7 @@ static bool get_environment_memtag_setting(HeapTaggingLevel* level) {
   const bool is_vendor_prog = starts_with(progname, "/vendor/") || starts_with(progname, "/apex/com.google.");
   const bool is_debug_build = is_debuggable_build();
   if (is_vendor_prog) {
-    *level = M_HEAP_TAGGING_LEVEL_ASYNC;
+    *level = M_HEAP_TAGGING_LEVEL_NONE;
     if (!is_debug_build) {
         return true;
     }

However, scoping it to just the specific process didn't work. I can see that MTE is disabled for the [email protected] process, but it still crashes the same way. Maybe it's doing IPC with another process and it's really the other process crashing? Not sure how to check that.

diff --git a/libc/bionic/libc_init_static.cpp b/libc/bionic/libc_init_static.cpp
index 34cbad483..63c204902 100644
--- a/libc/bionic/libc_init_static.cpp
+++ b/libc/bionic/libc_init_static.cpp
@@ -263,6 +263,12 @@ static bool get_environment_memtag_setting(HeapTaggingLevel* level) {
   const char* progname = __libc_shared_globals()->init_progname;
   if (progname == nullptr) return false;
 
+  const bool is_google_camera = starts_with(progname, "/apex/com.google.pixel.camera.hal/");
+  if (is_google_camera) {
+    *level = M_HEAP_TAGGING_LEVEL_NONE;
+    return true;
+  }
+  
   const bool is_vendor_prog = starts_with(progname, "/vendor/") || starts_with(progname, "/apex/com.google.");
   const bool is_debug_build = is_debuggable_build();
   if (is_vendor_prog) {

chenxiaolong avatar May 16 '24 17:05 chenxiaolong

I spoke too soon. That's not it ^^.

The crashes still occur without MTE, though seemingly less frequently.

chenxiaolong avatar May 16 '24 18:05 chenxiaolong

https://github.com/GrapheneOS/platform_bionic/pull/45 has been working pretty well for me the past couple days. I'm seeing much fewer crashes (similar to my terrible hack of disabling MTE for all vendor processes).

chenxiaolong avatar May 21 '24 00:05 chenxiaolong

type: crash
osVersion: google/husky/husky:14/AP1A.240505.005/2024051500:user/release-keys
uid: 1000 (u:r:hal_camera_default:s0)
cmdline: /apex/com.google.pixel.camera.hal/bin/hw/[email protected]
processUptime: 0s

signal: 11 (SIGSEGV), code 9 (SEGV_MTESERR), faultAddr 500dccf19ac0370
cause: [MTE]: Buffer Overflow, 0 bytes right of a 4048-byte allocation at 0xdccf19abf3a0
cause: [MTE]: Buffer Underflow, 64 bytes left of a 4080-byte allocation at 0xdccf19ac03b0
cause: [MTE]: Buffer Overflow, 36960 bytes right of a 4096-byte allocation at 0xdccf19ab6310
threadName: RunnerN:P+:A:2
MTE: enabled

backtrace:
    /apex/com.google.pixel.camera.hal/lib64/libgoog_catpipe.so (pc fd2b00)
    /apex/com.google.pixel.camera.hal/lib64/libgoog_catpipe.so (pc fcff84)
    /apex/com.google.pixel.camera.hal/lib64/libgoog_catpipe.so (pc fcf538)
    /apex/com.google.pixel.camera.hal/lib64/libgoog_catpipe.so (pc fd1300)
    /apex/com.google.pixel.camera.hal/lib64/libgoog_catpipe.so (CatNodeHALCallProcessFrame+104, pc de1d78)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 9d6dec)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 9d5d7c)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 5f0970)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc fb1280)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc fd8ea4)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc fda7a8)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc f8c160)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 8b94d0)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 477f88)
    /apex/com.google.pixel.camera.hal/lib64/liblyric_hwl.so (pc 477ec8)
    /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204, pc d5e6c)
    /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68, pc 69a64)

donttracemebruh avatar May 21 '24 11:05 donttracemebruh

MTE is disabled for this process in the next release due to these upstream bugs. Next release will happen within a few days.

thestinger avatar May 21 '24 13:05 thestinger