sentry-react-native icon indicating copy to clipboard operation
sentry-react-native copied to clipboard

invalid pthread_t 0x<sanitized> passed to pthread_kill coming from startProfiling

Open coolsoftwaretyler opened this issue 1 month ago • 10 comments

What React Native libraries do you use?

Hermes, RN New Architecture, React Navigation, Expo Application Services (EAS), Expo (mobile only)

Are you using sentry.io or on-premise?

sentry.io (SaS)

@sentry/react-native SDK Version

7.7.0

How does your development environment look like?

We're in a pnpm monorepo with Expo so I don't think it'll get much useful info. Here's what I get from that:

⚠️ react-native depends on @react-native-community/cli for cli commands. To fix update your package.json to include:


  "devDependencies": {
    "@react-native-community/cli": "latest",
  }

Sentry.init()

Sentry.init({
    environment: <we get this dynamically with a function call>
    release: <we get this dynamically with a function call>,
    dist,
    dsn: sentryEnabled ? 'our-dsn' : undefined,

    // Enable this only if you're testing Sentry changes in development
    debug: false,
    tracesSampleRate: 0.1,
    profilesSampleRate: 0.1,

    replaysOnErrorSampleRate: enableSessionReplay ? 0.05 : 0,
    replaysSessionSampleRate: 0,

    integrations: [
      navigationInstrumentation, // Capture navigation breadcrumbs & performance spans
      sentryMobileReplayIntegration({
        maskAllText: true,
        maskAllImages: false,
        maskAllVectors: false,
      }),
    ],
    initialScope: {
      tags: (() => {
        const baseTags: Record<string, string | undefined> = {...someTagsHere};
        return baseTags;
      })(),
    },
});

Steps to Reproduce

I did my best to reproduce this, but it seems to be an intermittent issue in production. I think the root cause may be threading issues, which is hard to artificially force.

Here's what I know:

Google Play Console is reporting a crash with this error:

invalid pthread_t 0x<sanitized> passed to pthread_kill

Here's the stack trace in Google Play:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 5131 >>> com.my.app <<<

backtrace:
  #00  pc 0x000000000007123c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #01  pc 0x0000000000082c28  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_find(long, char const*)+196)
  #02  pc 0x0000000000082b44  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_gettid(long, char const*)+12)
  #03  pc 0x0000000000083948  /apex/com.android.runtime/lib64/bionic/libc.so (pthread_kill+52)
  #04  pc 0x00000000001105e0  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #05  pc 0x0000000000110a60  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #06  pc 0x000000000011095c  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #07  pc 0x0000000000110d78  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #08  pc 0x0000000000111480  /data/app/~~GVAFjiv3MRHi6wN_jyFNGQ==/com.my.app-dfsyqHFTuzXmYnuve2e92g==/split_config.arm64_v8a.apk!libhermes.so (BuildId: b06c77a49801680608345e3c05bd59aba90e9f19)
  #09  pc 0x0000000000082600  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+184)
  #10  pc 0x0000000000074a58  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

And here's that same trace symbolicated with Hermes:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 15474 >>> com.my.app <<<

backtrace:
  #00  pc 0x000000000007137c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+160)
  #01  pc 0x0000000000082d68  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_find(long, char const*)+196)
  #02  pc 0x0000000000082c84  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_internal_gettid(long, char const*)+12)
  #03  pc 0x0000000000083a88  /apex/com.android.runtime/lib64/bionic/libc.so (pthread_kill+52)
  #04  pc 0x00000000001105e0  libhermes.so
       hermes::vm::sampling_profiler::Sampler::platformSuspendVMAndWalkStack(hermes::vm::SamplingProfiler*)
       SamplingProfilerPosix.cpp:321
  #05  pc 0x0000000000110a60  libhermes.so
       hermes::vm::sampling_profiler::Sampler::sampleStack(hermes::vm::SamplingProfiler*)
       SamplingProfilerSampler.cpp:99
  #06  pc 0x000000000011095c  libhermes.so
       hermes::vm::sampling_profiler::Sampler::sampleStacks()
       SamplingProfilerSampler.cpp:62
  #07  pc 0x0000000000110d78  libhermes.so
       hermes::vm::sampling_profiler::Sampler::timerLoop(double)
       SamplingProfilerSampler.cpp:162
  #08  pc 0x0000000000111480  libhermes.so
       std::__ndk1::__thread_proxy<...>(void*)
       __thread/thread.h:199
  #09  pc 0x0000000000082740  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+184)
  #10  pc 0x0000000000074b98  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

---
Analysis: Crash in Hermes Sampling Profiler

The sampling profiler's background thread called pthread_kill() to send a signal 
to a JavaScript thread for stack sampling, but the target thread no longer exists 
(or has an invalid thread ID). This caused __pthread_internal_find() to fail and 
call abort().

This is a race condition: the profiler's timer loop tried to sample a thread that 
was terminated between when the profiler registered it and when the sample was attempted.

Hermes version: 0.79.6 (React Native 0.79.6)
Build ID: b06c77a49801680608345e3c05bd59aba90e9f19

I brought this up with the Hermes maintainers. This issue can happen when something calls the Hermes profiler, and if the relevant thread can't be addressed. A long time ago, React Native Reanimated had this issue in some instances. And I found that @sentry/react-native seems to be calling methods that would trigger this profiling as well.

startProfiling calls HermesSamplingProfiler.enable, which in turn calls hermesAPI->enableSamplingProfiler():

public WritableMap startProfiling(boolean platformProfilers) {
    final WritableMap result = new WritableNativeMap();
    if (androidProfiler == null && platformProfilers) {
      initializeAndroidProfiler();
    }

    try {
      HermesSamplingProfiler.enable();
      if (androidProfiler != null) {
        androidProfiler.start();
      }

      result.putBoolean("started", true);
    } catch (Throwable e) { // NOPMD - We don't want to crash in any case
      result.putBoolean("started", false);
      result.putString("error", e.toString());
    }
    return result;
  }

https://github.com/getsentry/sentry-react-native/blob/0dd606fdeddc7721168502167fb2d5d5e4b87c7b/packages/core/android/src/main/java/io/sentry/react/RNSentryModuleImpl.java#L946-L964 (I found this code is the same 7.7.0 tag as well)

Which calls:

void HermesSamplingProfiler::enable(jni::alias_ref<jclass> /*unused*/) {
  auto* hermesAPI =
      castInterface<hermes::IHermesRootAPI>(hermes::makeHermesRootAPI());
  hermesAPI->enableSamplingProfiler();
}

The Hermes team considers this to be atypical.

Again, I don't have a formal reproducer (I believe we'd need to line up exactly the right conditions: get traces started on threads that become unaddressable - I can't figure out the code needed to reproduce). But I'm hoping this is enough information for you all to track it down. If you've got ideas of what might reproduce this, that would be helpful to me (and I'd be happy to put together a clearer reproduction).

I'm hoping the solution is as simple as adding some kind of error handling in your startProfiling method so that if we hit this pthread error, the app does not crash. Right now we seem to be getting SIGABRT errors. If we can just catch and handle those, I think that would fix my issue. Again, without a reproducer I can't quite reason out what needs to change, but I'm hoping y'all would have a good idea.

Expected Result

App should never crash when Sentry profiling is run. Right now this happens fairly infrequently (about 1-3% of users, maybe?) - but I think this should be 0%.

Actual Result

Seeing crashes in Google Play Console.

coolsoftwaretyler avatar Dec 04 '25 19:12 coolsoftwaretyler

RN-420

linear[bot] avatar Dec 04 '25 19:12 linear[bot]

Thank you for reporting and the detailed issue description @coolsoftwaretyler 🙇 We will investigate this and iterate back.

antonis avatar Dec 05 '25 10:12 antonis

Thank you for your patience on this @coolsoftwaretyler. Just a heads up that we haven't figured a way to reproduced this but we are still looking at the issue. You detailed investigation is really helpful 🙇 Please also let us know if you notice the crash again after disabling the feature.

antonis avatar Dec 10 '25 16:12 antonis

Thanks, all! I'll have more data from production soon and will follow up with what I observe.

coolsoftwaretyler avatar Dec 10 '25 23:12 coolsoftwaretyler

Hey folks, I have roughly 3 days worth of data across the same user base. In the app version with profilesSampleRate set to 0, we have not seen any more of these crashes reported to Google Play.

I have a reminder set to follow up on the it next week, but early evidence continues to point at profiling as the source here.

coolsoftwaretyler avatar Dec 11 '25 18:12 coolsoftwaretyler

Thank you for iterating on this 🙇

antonis avatar Dec 12 '25 09:12 antonis

Just a heads up that we are also investigating a possible connection of this issue with https://github.com/getsentry/sentry-java/issues/2604

For this we have a troubleshooting page at https://docs.sentry.io/platforms/android/profiling/troubleshooting/#i-see-elevated-number-of-crashes-in-the-android-runtime-when-profiling-is-activated

@coolsoftwaretyler I was wondering if you have noticed a pattern in the crash reports related to the Android OS version?

antonis avatar Dec 12 '25 10:12 antonis

Hey @antonis - thanks for the link. Looks similar, although it's interesting to note that our "invalid pthread" is passed to pthread_kill, rather than pthread_getcpuclockid.

Google Play says 100% of the affected users are on Android 16 Beta (SDK 36).

coolsoftwaretyler avatar Dec 12 '25 15:12 coolsoftwaretyler

Here's some device breakdown: 20% of affected users are on the Google Komodo, 20% are on Samsung b0q, 10% on Samsung q6q, 10% on Google Tokay, and then a long tail of Samsung devices.

coolsoftwaretyler avatar Dec 12 '25 16:12 coolsoftwaretyler

Thank you for the for the breakdown. This may help us reproduce this.

antonis avatar Dec 12 '25 16:12 antonis

Hey folks, just following up with more data for you.

We have begun ramping up the Android app version with profiling disabled. With a full week of new usage data, and an order of magnitude more users, we still have not seen this crash come back.

It's looking more and more like disabling profiling has resolved this crash for us.

coolsoftwaretyler avatar Dec 18 '25 18:12 coolsoftwaretyler

Thank you for iterating on this @coolsoftwaretyler 🙇

It's looking more and more like disabling profiling has resolved this crash for us.

Glad that this solved the issue on you side. We are still looking on how to fix that on our end 🤔

Google Play says 100% of the affected users are on Android 16 Beta (SDK 36).

I was wondering if this is still the case.

antonis avatar Dec 19 '25 08:12 antonis

Yes, since we haven't seen the crash again, all of the metrics I listed remain true.

coolsoftwaretyler avatar Dec 19 '25 13:12 coolsoftwaretyler