os-issue-tracker icon indicating copy to clipboard operation
os-issue-tracker copied to clipboard

adbd crash (segfault)

Open 0x1a8510f2 opened this issue 1 year ago • 4 comments

While attempting to connect to android studio with ABD over WiFi, via QR code:

type: crash
osVersion: google/husky/husky:14/AP2A.240605.024/2024061400:user/release-keys
uid: 2000 (u:r:adbd:s0)
cmdline: /apex/com.android.adbd/bin/adbd --root_seclabel=u:r:su:s0
processUptime: 64s

signal: 11 (SIGSEGV), code 9 (SEGV_MTESERR), faultAddr 900c162e3306fe8
threadName: adbd
MTE: enabled

backtrace:
    /apex/com.android.adbd/bin/adbd (Connection::Serial() const+12, pc 10799c)
    /apex/com.android.adbd/bin/adbd (BlockingConnectionAdapter::Stop()+824, pc 107f08)
    /apex/com.android.adbd/bin/adbd (atransport::Kick()+128, pc 109270)
    /apex/com.android.adbd/bin/adbd (void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, adbd_auth_tls_handshake(atransport*)::$_0> >(void*)+324, pc 113584)
    /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+204, pc 795dc)
    /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68, pc 69fa4)

0x1a8510f2 avatar Jun 15 '24 14:06 0x1a8510f2

@0x1a8510f2 Does it happen consistently, or did it only happen once?

thestinger avatar Jun 15 '24 14:06 thestinger

This is my first time using ADB over WiFi on this phone, but I have had this happen twice in a row after the phone disconnected from Android Studio and I tried pairing again.

Note that this is immediately after updating the phone and app optimisation is still running in the background. Not sure if this is relevant.

The phone appears to be connected fine for now and not disconnecting, but the crash did happen repeatedly.

0x1a8510f2 avatar Jun 15 '24 14:06 0x1a8510f2

Just seen it happen again without the app optimisation running. The whole connection process with Android Studio also seems very flakey. In particular, sometimes I'm unable to remove the debugging authorisation (I click forget but it just stays there). Not sure if this is unique to Graphene though since I've not used the feature on stock OS before.

0x1a8510f2 avatar Jun 15 '24 20:06 0x1a8510f2

I am also experiencing this consistently on a Pixel 8a.

AaronDewes avatar Jun 24 '24 11:06 AaronDewes

This is no longer happening for me on build 2024083100.

AaronDewes avatar Sep 04 '24 07:09 AaronDewes

@AaronDewes Should be no relevant changes in that particular release.

thestinger avatar Sep 04 '24 07:09 thestinger

@AaronDewes Should be no relevant changes in that particular release.

This was the first time I noticed the bug being gone, I was previously using ADB via cable since the end of June. The fix could have been at any point in between Jun 24 and today. Maybe the issue was fixed upstream at some point?

AaronDewes avatar Sep 04 '24 07:09 AaronDewes

Looks like I was too early. It's not fully fixed, a few crashes are still happening, but there's now a ~50% success rate for connections, while it was always failing previously.

AaronDewes avatar Sep 04 '24 07:09 AaronDewes

https://github.com/GrapheneOS/os-issue-tracker/issues/3770 was reported on a July release after the monthly update. The adb module last had changes in the June release.

thestinger avatar Sep 04 '24 07:09 thestinger

I don't know, I just know that it stopped happening. Maybe the ADB client received an update and that fixed it. I recently updated ADB on my laptop too.

AaronDewes avatar Sep 04 '24 16:09 AaronDewes

I was getting similar crashes in Shizuku on the previous release as well. Hasn't happened yet on the latest release. I'll try to catch the logs if it does happen.

Enovale avatar Oct 03 '24 05:10 Enovale

Could you reopen the issue? Happens constantly. Having the same problem since last update on Pixel 9 when I try to connect via scrcpy:

type: crash
flags: dev options enabled
osVersion: google/tokay/tokay:15/AP3A.241105.008/2024112700:user/release-keys
uid: 2000 (u:r:adbd:s0)
cmdline: /apex/com.android.adbd/bin/adbd --root_seclabel=u:r:su:s0
processUptime: 13394s

signal: 11 (SIGSEGV), code 9 (SEGV_MTESERR), faultAddr e00cc487dd8c9a8
threadName: adbd
MTE: enabled

backtrace:
    /apex/com.android.adbd/bin/adbd (Connection::Serial() const+16, pc c1bf0)
    /apex/com.android.adbd/bin/adbd (BlockingConnectionAdapter::Stop()+820, pc c2264)
    /apex/com.android.adbd/bin/adbd (atransport::Kick()+124, pc c3bdc)
    /apex/com.android.adbd/bin/adbd (void* std::__1::__thread_proxy[abi:nn180000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, adbd_auth_tls_handshake(atransport*)::$_0>>(void*)+360, pc cd528)
    /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+200, pc 7afe8)
    /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68, pc 6b164)

bobblkabb avatar Nov 30 '24 11:11 bobblkabb

Got the 2024121200 build on my pixel 9 but it's still happening:

type: crash
flags: dev options enabled
osVersion: google/tokay/tokay:15/AP4A.241205.013/2024121200:user/release-keys
uid: 2000 (u:r:adbd:s0)
cmdline: /apex/com.android.adbd/bin/adbd --root_seclabel=u:r:su:s0
processUptime: 22604s

signal: 11 (SIGSEGV), code 9 (SEGV_MTESERR), faultAddr 100d576d9a3ab68
threadName: adbd
MTE: enabled

backtrace:
    /apex/com.android.adbd/bin/adbd (Connection::Serial() const+16, pc c2470)
    /apex/com.android.adbd/bin/adbd (BlockingConnectionAdapter::Stop()+820, pc c2b44)
    /apex/com.android.adbd/bin/adbd (atransport::Kick()+124, pc c44ac)
    /apex/com.android.adbd/bin/adbd (void* std::__1::__thread_proxy[abi:nn190000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, adbd_auth_tls_handshake(atransport*)::$_0>>(void*)+360, pc cddc8)
    /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+132, pc 7ba94)
    /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68, pc 6b474)

bobblkabb avatar Dec 15 '24 12:12 bobblkabb

Same issue on Pixel 8 Pro running 2025032100 when enabling Wi-Fi debugging and scanning the QR code from Android Studio on an Ubuntu Desktop 24.04.2 LTS host.

type: crash
flags: dev options enabled
osVersion: google/husky/husky:15/BP1A.250305.019/2025032100:user/release-keys
uid: 2000 (u:r:adbd:s0)
cmdline: /apex/com.android.adbd/bin/adbd --root_seclabel=u:r:su:s0 --tim_seclabel=u:r:adbd_tradeinmode:s0
processUptime: 729s

signal: 11 (SIGSEGV), code 9 (SEGV_MTESERR), faultAddr 200bf381539ea68
threadName: adbd
MTE: enabled

backtrace:
    /apex/com.android.adbd/bin/adbd (Connection::Serial() const+16, pc cd8e0)
    /apex/com.android.adbd/bin/adbd (BlockingConnectionAdapter::Stop()+824, pc cdfd8)
    /apex/com.android.adbd/bin/adbd (atransport::Kick()+124, pc cf94c)
    /apex/com.android.adbd/bin/adbd (void* std::__1::__thread_proxy[abi:nn190000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, adbd_auth_tls_handshake(atransport*)::$_0>>(void*)+360, pc dafe8)
    /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+180, pc 7b654)
    /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68, pc 6bc64)

Maybe we should reopen this issue?

AlexAltea avatar Apr 04 '25 18:04 AlexAltea

@muhomorr @thestinger Unless I'm misunderstanding something, I don't think the PR at: https://github.com/GrapheneOS/platform_packages_modules_adb/pull/1 would solve this issue.

BlockingConnectionAdapter::~BlockingConnectionAdapter() {
    bool stopped = false;
    {
        std::lock_guard<std::mutex> lock(mutex_);
        stopped = stopped_;
    }
    if (stopped) {
        LOG(INFO) << "~BlockingConnectionAdapter: already stopped";
    } else {
        LOG(INFO) << "BlockingConnectionAdapter(" << Serial() << "): destructing";
        Stop();
    }
}

If two threads entered BlockingConnectionAdapter::~BlockConnectionAdapter simultaneously (which is ALREADY a problem in and of itself) and stopped_ happened to be false for both, which is what mutex_ seems to guard against, there's a chance both threads still call Stop().

However, note BlockingConnectionAdapter::Stop() already checks for stopped_ in a thread-safe manner:

void BlockingConnectionAdapter::Stop() {
    {
        std::lock_guard<std::mutex> lock(mutex_);
        if (!started_) {
            LOG(INFO) << "BlockingConnectionAdapter(" << Serial() << "): not started";
            return;
        }

        if (stopped_) {
            LOG(INFO) << "BlockingConnectionAdapter(" << Serial() << "): already stopped";
            return;
        }

        stopped_ = true;
    }

    // ...
}

Only one thread can proceed, which is what we wanted. So the PR is unnecessary IMHO.

AlexAltea avatar Apr 04 '25 18:04 AlexAltea

@thestinger I think I have found the issue!

  1. Thread A and Thread B enter BlockingConnectionAdapter::Stop() as stopped_ == false.
void BlockingConnectionAdapter::Stop() {
    {
        std::lock_guard<std::mutex> lock(mutex_);
        // ...

        if (stopped_) {
            LOG(INFO) << "BlockingConnectionAdapter(" << Serial() << "): already stopped"; // <-- Uses transport_
            return;
        }

        stopped_ = true;
    }
    // ...
    std::call_once(this->error_flag_, [this]() { transport_->HandleError("requested stop"); });
}

// ...

void atransport::HandleError(const std::string& error) {
    // ...
    fdevent_run_on_looper([this]() {
        handle_offline(this);
        transport_destroy(this); // <-- Destroys this, i.e. transport_
    });
}
  1. Thread A locks first, sets stopped_ = true; and unlocks the mutex_ to proceed with stopping the connection.
  2. Thread B resumes with stopped_ == true and is destined to return early.
  3. Here is the race:
    1. Thread A could reach transport_->HandleError("requested stop"); and destroy transport_.
    2. ...Before Thread B reaches Serial() which depends on transport_:
    std::string Connection::Serial() const {
        return transport_ ? transport_->serial_name() : "<unknown>";
    }
    

If this is correct, this bug is also present on AOSP... https://android.googlesource.com/platform/packages/modules/adb/+/refs/heads/main/transport.cpp

I think @muhomorr was on the right track after all: the PR avoided to call Serial() again in the stopped == true path.

AlexAltea avatar Apr 04 '25 19:04 AlexAltea

Reported the bug, and suggested a potentially cleaner fix: https://issuetracker.google.com/issues/409577149

AlexAltea avatar Apr 09 '25 20:04 AlexAltea