kotlinx.coroutines icon indicating copy to clipboard operation
kotlinx.coroutines copied to clipboard

Android ART Native Crash sometimes occurs when changing a threads name

Open adecker89 opened this issue 4 years ago • 22 comments


*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 0 >>> com.- <<<

backtrace:
  #00  pc 00000000000830f0  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #00  pc 00000000004b977c  /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2280)
  #00  pc 000000000000b458  /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+580)
  #00  pc 00000000005108a0  /apex/com.android.runtime/lib64/libart.so (art::ThreadSuspendByPeerWarning(art::Thread*, android::base::LogSeverity, char const*, _jobject*)+656)
  #00  pc 000000000050fa80  /apex/com.android.runtime/lib64/libart.so (art::ThreadList::SuspendThreadByPeer(_jobject*, bool, art::SuspendReason, bool*)+1120)
  #00  pc 000000000042fa88  /apex/com.android.runtime/lib64/libart.so (art::Thread_setNativeName(_JNIEnv*, _jobject*, _jstring*)+1076)
  #00  pc 00000000000ba7e8  /system/framework/arm64/boot.oat (art_jni_trampoline+152)
  #00  pc 00000000001a5468  /system/framework/arm64/boot.oat (java.lang.Thread.setName+120)
  #00  pc 00000000002ff26c  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.setIndexInArray+284)
  #00  pc 0000000000137334  /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548)
  #00  pc 0000000000145fec  /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+244)
  #00  pc 00000000002e3948  /apex/com.android.runtime/lib64/libart.so (art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread*, art::ArtMethod*, art::ShadowFrame*, unsigned short, art::JValue*)+384)
  #00  pc 00000000002deba8  /apex/com.android.runtime/lib64/libart.so (bool art::interpreter::DoCall<false, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+892)
  #00  pc 00000000005a728c  /apex/com.android.runtime/lib64/libart.so (MterpInvokeVirtualQuick+616)
  #00  pc 0000000000135594  /apex/com.android.runtime/lib64/libart.so (mterp_op_invoke_virtual_quick+20)
  #00  pc 00000000017d7a5e  /data/app/com.-==/oat/arm64/base.vdex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.tryTerminateWorker+174)
  #00  pc 00000000002b4c5c  /apex/com.android.runtime/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.llvm.4579747295037710581)+240)
  #00  pc 0000000000592688  /apex/com.android.runtime/lib64/libart.so (artQuickToInterpreterBridge+1032)
  #00  pc 0000000000140468  /apex/com.android.runtime/lib64/libart.so (art_quick_to_interpreter_bridge+88)
  #00  pc 00000000002fcd94  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.park+180)
  #00  pc 00000000002fe314  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.tryPark+628)
  #00  pc 00000000002fd984  /data/app/com-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker+228)
  #00  pc 00000000002ff128  /data/app/com.-==/oat/arm64/base.odex (kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run+40)
  #00  pc 0000000000137334  /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548)
  #00  pc 0000000000145fec  /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+244)
  #00  pc 00000000004b103c  /apex/com.android.runtime/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104)
  #00  pc 00000000004b2150  /apex/com.android.runtime/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue const*)+416)
  #00  pc 00000000004f2b18  /apex/com.android.runtime/lib64/libart.so (art::Thread::CreateCallback(void*)+1176)
  #00  pc 00000000000e6890  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
  #00  pc 0000000000084b6c  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

Looking at the the art source it seems like this timeout when trying to set the thread name may be the culprit https://android.googlesource.com/platform/art/+/master/runtime/thread_list.cc#970

It's not really clear to me whether this timeout would be triggered based on a specific state of the Thread or if it's just a race condition that is sometimes triggered.

It seems like the thread name changing is mostly cosmetic? I wonder if an option to disable or reduce thread name changes might be a quick solution--as much as I would hate giving that up for debug purposes :(

adecker89 avatar Sep 09 '20 00:09 adecker89

Hi, this is clearly an Android bug that should be reported to Google bug tracker, we cannot do much on our side here.

I wonder if an option to disable or reduce thread name changes might be a quick solution

Yes, you can disable debug mode to workaround that problem. If this problem becomes too frequent and/or widespread, we'll consider adding a separate flag to disable thread renaming only.

qwwdfsad avatar Sep 13 '20 13:09 qwwdfsad

I do agree that it's ultimately an android issue and plan to file a bug there as well. But, since such a fix would take quite some time to propagat through the ecosystem, I think that a solution on the library side is sorta needed just due to practicality.

I'll take a look at disabling debug mode and see if it resolves our problem (I've only seen this crash reported in the wild). Whether a separate flag is worth the effort may be a matter of how much pain we feel when debugging other production crash reports with the debug flag off.

Thanks for your help!

adecker89 avatar Sep 15 '20 19:09 adecker89

Thanks!

Please keep us informed about this problem. Let's leave it open for a while to see if there are other similar reports

qwwdfsad avatar Sep 16 '20 11:09 qwwdfsad

Dunno if related, I also get a hard crash, but only from an app deployed to the app store. If the app is built locally it works fine (both in debug/release mode). My code uses coroutines.

backtrace:
  #00  pc 00000000000603ca  /apex/com.android.runtime/lib/bionic/libc.so (abort+166)
  #00  pc 0000000000155cb7  /data/app/com.nuvopoint.tillty-Qdr1nI2XUQzzDA_hin4rlg==/split_config.armeabi_v7a.apk!lib/armeabi-v7a/libflutter.so (offset 0x1401000)
  #00  pc 00000000002d1c1f  /apex/com.android.runtime/lib/libart.so (art::JNI::SetBooleanArrayRegion(_JNIEnv*, _jbooleanArray*, int, int, unsigned char const*)+1650)

larssn avatar Dec 15 '20 10:12 larssn

I think it's same crash backtrace: #00 pc 0000000000083860 /apex/com.android.runtime/lib64/bionic/libc.so (abort+176) #00 pc 00000000004b997c /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2172) #00 pc 000000000000c650 /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+608) #00 pc 00000000004475e8 /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const+280) #00 pc 000000000044eed8 /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const+40) #00 pc 000000000045a578 /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+376) #00 pc 00000000004c70a8 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+104) #00 pc 00000000004db2e4 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356) #00 pc 00000000004da334 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+252) #00 pc 00000000000e5614 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36) #00 pc 0000000000085458 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

dan15h avatar Dec 15 '20 10:12 dan15h

Mine is related to the R8 minifier (three shaking code it shouldnt), so I think my crash is unrelated to this. EDIT: It was: some custom proguard rules fixed it.

larssn avatar Dec 15 '20 16:12 larssn

Mine is related to the R8 minifier (three shaking code it shouldnt), so I think my crash is unrelated to this. EDIT: It was: some custom proguard rules fixed it.

Did you solve the problem?How did you confirm that the problem was introduced by R8?

viviant1224 avatar Dec 22 '20 06:12 viviant1224

I think it's same crash backtrace: #00 pc 0000000000083860 /apex/com.android.runtime/lib64/bionic/libc.so (abort+176) #00 pc 00000000004b997c /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2172) #00 pc 000000000000c650 /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+608) #00 pc 00000000004475e8 /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const+280) #00 pc 000000000044eed8 /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const+40) #00 pc 000000000045a578 /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+376) #00 pc 00000000004c70a8 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)+104) #00 pc 00000000004db2e4 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356) #00 pc 00000000004da334 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+252) #00 pc 00000000000e5614 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36) #00 pc 0000000000085458 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

We have the same stack trace, without using coroutines and only for Android 10 devices:

backtrace:
  #00  pc 000000000006f06c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #00  pc 00000000004d08d0  /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)+2512)
  #00  pc 000000000000c600  /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+684)
  #00  pc 000000000045c978  /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const+280)
  #00  pc 00000000004643ec  /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const+40)
  #00  pc 00000000004725c8  /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+376)
  #00  pc 00000000004ddd34  /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+104)
  #00  pc 00000000004f1c4c  /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356)
  #00  pc 00000000004f0ce0  /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+252)
  #00  pc 00000000000ce1b0  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
  #00  pc 0000000000070ba8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

In fact, it was reported months ago here but according to this other issue comment, it looks like something that was solved for Android 11 onwards.

nimeacuerdo avatar Jan 11 '21 14:01 nimeacuerdo

Exactly the same crash as nimeacuerdo. Android 10 - 99.8 % Android 11 - 0.2

Turalllb avatar Mar 17 '21 10:03 Turalllb

I am also facing the same not able to identify the issue. It occurs in production.

Aravindh1505 avatar Mar 19 '21 08:03 Aravindh1505

Thanks for pointing it out, it seems it's worth adding a feature flag to disable thread name change.

As a workaround, it's possible to disable the debug mode at all in the production builds to avoid that crash

qwwdfsad avatar May 27 '21 16:05 qwwdfsad

One more harmful evidence about this feature enabled by default, the following code is purely dominated by setName calls:

@Test
  fun test() {
    (0..100).map {
      runBlocking(Dispatchers.Default) {
        val n = 100_000
        measureTime {
          val (s1: SendChannel<Int>, r1: ReceiveChannel<Int>) = Channel<Int>().let { it to it }
          val (s2: SendChannel<Int>, r2: ReceiveChannel<Int>) = Channel<Int>().let { it to it }
          val (s3: SendChannel<Int>, r3: ReceiveChannel<Int>) = Channel<Int>().let { it to it }
          launch {
            for (i in 0..n) {
              s1.send(i)
            }
          }.invokeOnCompletion {
            s1.close(it)
          }
          launch {
            r1.consumeEach {
              s2.send(it)
            }
          }.invokeOnCompletion {
            s2.close(it)
          }
          launch {
            r2.consumeEach {
              s3.send(it)
            }
          }.invokeOnCompletion {
            s3.close(it)
          }
          assertEquals(n, r3.consumeAsFlow().last())
        }
      }
    }.toList().let {
      print("TIMES: $it")
    }
  }
Flamegraph image

qwwdfsad avatar Aug 04 '22 08:08 qwwdfsad

Only occurs on android 10.

Check failed: key_value != nullptr compiler-filter not found in oat header: #00 pc 0x00000000000830f0 /apex/com.android.runtime/lib64/bionic/libc.so (abort) #00 pc 0x00000000004b8358 /apex/com.android.runtime/lib64/libart.so (art::Runtime::Abort(char const*)) #00 pc 0x000000000000b458 /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()) #00 pc 0x000000000044528c /apex/com.android.runtime/lib64/libart.so (art::OatHeader::GetCompilerFilter() const) #00 pc 0x000000000044cbb4 /apex/com.android.runtime/lib64/libart.so (art::OatFile::GetCompilerFilter() const) #00 pc 0x00000000004582b8 /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)) #00 pc 0x00000000004c5718 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits>&)) #00 pc 0x00000000004d9bb0 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()) #00 pc 0x00000000004d8c5c /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)) #00 pc 0x00000000000e6ce0 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)) #00 pc 0x0000000000084b6c /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread)

NB-Fred avatar Sep 02 '22 06:09 NB-Fred

Thread suspension timed out: 0x<sanitized>:DefaultDispatcher-worker-4

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
backtrace:
  #00  pc 0x000000000008be20  /apex/com.android.runtime/lib64/bionic/libc.so (abort+168)
  #01  pc 0x00000000006d29ec  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+704)
  #02  pc 0x0000000000016ea8  /apex/com.android.art/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_3::__invoke(char const*)+80)
  #03  pc 0x0000000000016450  /apex/com.android.art/lib64/libbase.so (android::base::LogMessage::~LogMessage()+352)
  #04  pc 0x00000000006fe104  /apex/com.android.art/lib64/libart.so (art::ThreadSuspendByPeerWarning(art::Thread*, android::base::LogSeverity, char const*, _jobject*) (.__uniq.215660552210357940630679712151551015321)+552)
  #05  pc 0x0000000000374048  /apex/com.android.art/lib64/libart.so (art::ThreadList::SuspendThreadByPeer(_jobject*, art::SuspendReason, bool*)+4016)
  #06  pc 0x000000000060a90c  /apex/com.android.art/lib64/libart.so (art::Thread_setNativeName(_JNIEnv*, _jobject*, _jstring*) (.__uniq.300150332875289415499171563183413458937)+492)
  #07  pc 0x00000000003ff1c0  /data/misc/apexdata/com.android.art/dalvik-cache/arm64/boot.oat (art_jni_trampoline+128)

Happens on many Android 12 and 12L devices

mtwalli avatar Oct 09 '22 21:10 mtwalli

We'll provide either a workaround or will change the default behaviour in the next major version, thanks everyone for the reports!

qwwdfsad avatar Oct 10 '22 08:10 qwwdfsad

Thanks for pointing it out, it seems it's worth adding a feature flag to disable thread name change.

As a workaround, it's possible to disable the debug mode at all in the production builds to avoid that crash

@qwwdfsad how to disable this debug mode in prodution build?

douglasjunior avatar Oct 27 '22 13:10 douglasjunior

@qwwdfsad is this fixed ?

lirongwei avatar Mar 05 '23 15:03 lirongwei

Any updates? Happens only on Android 10, various devices, 96% occurences when app in background. Only visible via Google Play console (~2k occurences per day).

backtrace:
  #00  pc 0x00000000005bb9f0  /apex/com.android.runtime/lib64/libart.so (artQuickToInterpreterBridge+880)
  #01  pc 0x0000000000150468  /apex/com.android.runtime/lib64/libart.so (art_quick_to_interpreter_bridge+88)
  #02  pc 0x00000000003d8d08  /system/framework/arm64/boot.oat (java.util.concurrent.ThreadPoolExecutor.runWorker+984)
  #03  pc 0x00000000003d5880  /system/framework/arm64/boot.oat (java.util.concurrent.ThreadPoolExecutor$Worker.run+64)
  #04  pc 0x0000000000217fc8  /system/framework/arm64/boot.oat (java.lang.Thread.run+72)
  #05  pc 0x0000000000147334  /apex/com.android.runtime/lib64/libart.so (art_quick_invoke_stub+548)
  #06  pc 0x00000000001561b4  /apex/com.android.runtime/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+252)
  #07  pc 0x00000000004d8208  /apex/com.android.runtime/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104)
  #08  pc 0x00000000004d929c  /apex/com.android.runtime/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue const*)+416)
  #09  pc 0x000000000051c470  /apex/com.android.runtime/lib64/libart.so (art::Thread::CreateCallback(void*)+1232)
  #10  pc 0x00000000000cf7c0  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
  #11  pc 0x00000000000721a8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

Note: Also, this is not the only backtrace. Lines #02 and #03 are changing based on what thread is running inside - we use facebook sdk as well so a lot of times these 2 lines look like this (the rest remains the same):

  #02  pc 0x0000000000afd354  /data/app/obfuckated_package_name-9JniyMgIWMHr25wQXKTrVw==/oat/arm64/base.odex (com.facebook.ads.internal.dynamicloading.DynamicLoaderFactory.doCallInitialize+372)
  #03  pc 0x0000000000afd004  /data/app/obfuckated_package_name-9JniyMgIWMHr25wQXKTrVw==/oat/arm64/base.odex (com.facebook.ads.internal.dynamicloading.DynamicLoaderFactory$1.run+980)

arenaq avatar Mar 13 '23 09:03 arenaq

Hi, this is clearly an Android bug that should be reported to Google bug tracker, we cannot do much on our side here.

I wonder if an option to disable or reduce thread name changes might be a quick solution

Yes, you can disable debug mode to workaround that problem. If this problem becomes too frequent and/or widespread, we'll consider adding a separate flag to disable thread renaming only.

Going trough the link provided. It seems like you use a ProGuard rules to remove some debug modules. Is there a way to disable the debug mode without using ProGuard?

arenaq avatar Mar 13 '23 10:03 arenaq

Hello @qwwdfsad, https://github.com/Kotlin/kotlinx.coroutines/issues/2234#issuecomment-1272627958 is this issue fixed?

londhep avatar Aug 14 '23 08:08 londhep