sentry-electron icon indicating copy to clipboard operation
sentry-electron copied to clipboard

eventLoopBlockIntegration doesn't detect main process hangs

Open nikitakot opened this issue 4 months ago • 16 comments

Is there an existing issue for this?

  • [x] I have checked for existing issues https://github.com/getsentry/sentry-javascript/issues
  • [x] I have reviewed the documentation https://docs.sentry.io/
  • [x] I am using the latest SDK release https://github.com/getsentry/sentry-javascript/releases

How do you use Sentry?

Self-hosted/on-premise

Electron SDK Version

7.2.0

Electron Version

36.9.5

What platform are you using?

Windows

Link to Sentry event

No response

Steps to Reproduce

See gist - https://gist.github.com/nikitakot/5f6e7e111ddd77dc74d5166a3e56e4af

  1. $ npm i
  2. $ npx electron-rebuild
  3. $ npm start

The app will get hang as expected because of the blocking sync operation:

setTimeout(() => {
  console.log("Simulating event loop block in main process...");
  execSync("ping -n 5000 -w 1000 localhost > nul");
  console.log("Event loop block completed");
}, 5000);
Image

But the eventLoopBlock is not triggered.

Expected Result

eventLoopBlock is triggered.

Actual Result

eventLoopBlock is not triggered.

nikitakot avatar Oct 21 '25 12:10 nikitakot

ELECTR-34

linear[bot] avatar Oct 21 '25 12:10 linear[bot]

I don't know what platform you are on but I couldn't get your ping command to work.

When I used the following on macOS:

spawnSync('ping localhost', { shell: true, timeout: 10000 });

It captured an event with the expected stack frames:

                  {
                    function: '?',
                    filename: 'app:///src/main.mjs',
                    lineno: 32,
                    colno: 5,
                    in_app: true
                  },
                  {
                    function: 'longWork',
                    filename: 'app:///src/main.mjs',
                    lineno: 26,
                    colno: 3,
                    in_app: true
                  },
                  {
                    function: 'spawnSync',
                    filename: 'node:child_process',
                    lineno: 884,
                    colno: 24,
                    in_app: false
                  },
                  {
                    function: 'spawnSync',
                    filename: 'node:internal/child_process',
                    lineno: 1120,
                    colno: 20,
                    in_app: false
                  },

timfish avatar Oct 21 '25 19:10 timfish

@timfish I'm on Windows 11, can you give a try on Windows too?

Also how can I try to disable uploading the events and what folder should I check to exclude issues with our own Sentry instance? Like locally offline debug the sdk.

Thanks.

nikitakot avatar Oct 21 '25 19:10 nikitakot

This captures events in CI on all three platforms: https://github.com/getsentry/sentry-electron/pull/1253/files#diff-36e6cd764896407f6f6836936bae915a737ff555a63c8af3a7987ac6f3f06654R24

timfish avatar Oct 21 '25 22:10 timfish

@timfish I can confirm the spawnSync('ping localhost', { shell: true, timeout: 10000 }); does work on my end too.

What I realized though that if the process hangs completely and never "unhangs" the event is never generated. So if I change the command from my gist to a shorter time execSync("ping -n 5 -w 1000 localhost > nul"); (5 seconds) and wait for these 5 seconds for the process to unfreeze the event is sent and it seems to work as expected.

I'm not familiar with V8 but is it possible to send the hang event without waiting for the main electron javascript process to "unfreeze"? Our app sometimes suffers complete freezes and we can't figure out which part of the code causes that, eventLoopBlockIntegration would help in this case only if the events will be sent immediately without waiting the main process to unfreeze.

nikitakot avatar Oct 22 '25 11:10 nikitakot

@timfish i think the behavior i described is sort of by design, right? because Sentry in the main thread generates the event loop block events and since the main thread is blocked the event is never generated

I would like to try to upload these events to Sentry server from a worker thread directly after running the sentry-javascript-node-native-stacktrace library, can you give me some hints please on which Sentry components should I use to implement the logic in the worker thread? Thanks.

nikitakot avatar Oct 23 '25 07:10 nikitakot

This eventLoopBlockIntegration already uses a worker thread so that it can capture stack traces in any other thread (including the main thread).

I will add tests upstream in @sentry-internal/node-native-stacktrace to ensure it can in fact capture fully blocked threads.

timfish avatar Oct 23 '25 10:10 timfish

@timfish yeah you're right, I see this logic in sendBlockEvent in packages\node-native\src\event-loop-block-watchdog.ts.

However when I run the integration the event is not sent until the main thread gets unblocked, I can reproduce it every time.

[Sentry Event Loop Blocked Watchdog] Started
[13:30:31] Simulating event loop block in main process for 5 seconds...
[13:30:35] Event loop block completed
[Sentry Event Loop Blocked Watchdog] Blocked thread detected '0' last polled 1116 ms ago.
[Sentry Event Loop Blocked Watchdog] Sending abnormal session
[Sentry Event Loop Blocked Watchdog] [{"sent_at":"2025-10-23T11:30:35.157Z","sdk":{"name":"sentry.javascript.electron","version":"7.2.0"}},[[{"type":"session"},{"sid":"0cedcf430de14c949a43d7f9edfac6d0","init":true,"started":"2025-10-23T11:30:26.067Z","timestamp":"2025-10-23T11:30:35.155Z","status":"abnormal","errors":0,"duration":9.088639736175537,"abnormal_mechanism":"anr_foreground","attrs":{"release":"[email protected]","environment":"development","user_agent":"Node.js/22"}}]]]
[Sentry Event Loop Blocked Watchdog] Sending event

nikitakot avatar Oct 23 '25 11:10 nikitakot

I've tested the native module and it captures stack traces from a thread that is blocked indefinitely.

What you might be seeing is the fact that once the main thread is blocked, you will not get any console/debug output because the main thread is blocked!

The [Sentry Event Loop Blocked Watchdog] logs are only visible because the stdio is set to inherit which allows you to see the output from the worker in the main thread. Once the main thread is blocked, you will no longer see these logs passed through.

The events should still be sent to Sentry from the watchdog though!

timfish avatar Oct 23 '25 16:10 timfish

@timfish hm...and what about std::cout logs in the module.cc? they should be visible regardless, right?

what I see is that the blocked thread is correctly detected but then captureStackTrace gets stuck

[18:06:52] Simulating event loop block in main process for 5 seconds...
GetThreadsLastSeen called
GetUnbiasedMonotonicTime called
Windows time obtained: 264631511366300
GetThreadsLastSeen: checking 1 threads
GetThreadsLastSeen: 0 last seen 620 ms ago
GetThreadsLastSeen completed
GetThreadsLastSeen called
GetUnbiasedMonotonicTime called
Windows time obtained: 264632023528700
GetThreadsLastSeen: checking 1 threads
GetThreadsLastSeen: 0 last seen 1132 ms ago ----------> blocked thread gets blocked for more than the default 1000ms 
GetThreadsLastSeen completed
CaptureStackTraces called ----------------------> captureStackTrace is called in sendBlockEvent 
Total threads registered: 1
Capturing stack trace for thread: 0
Processing 1 futuresAsync task started for thread: 0
CaptureStackTrace called
Interrupt requested, waiting for result
Waiting for future result ------------------> here it gets stuck (auto result = future.get();)

ExecutionInterrupted called
Stack trace found with frames
Processing frame index: 0
Processing frame index: 1
Processing frame index: 2
Processing frame index: 3
Processing frame index: 4
Processing frame index: 5
ExecutionInterrupted completed
CaptureStackTrace completed with result size: 6
Got result for thread: 0 with frames: 6
[18:07:01] Event loop block completed
// Function to get the last seen time of all registered threads
void GetThreadsLastSeen(const FunctionCallbackInfo<Value> &args) {
  std::cout << "GetThreadsLastSeen called" << std::endl;
...
      std::cout << "GetThreadsLastSeen: " << info.thread_name << " last seen " << ms_since << " ms ago" << std::endl;
...
void CaptureStackTraces(const FunctionCallbackInfo<Value> &args) {
  std::cout << "CaptureStackTraces called" << std::endl;
 ...
  for (auto &future : futures) {
    std::cout << "Waiting for future result" << std::endl;
    auto result = future.get();
    std::cout << "Got result for thread: " << result.thread_name << " with frames: " << result.stack_frames.size() << std::endl;
...

nikitakot avatar Oct 23 '25 16:10 nikitakot

So far only tested this against https://github.com/getsentry/sentry-javascript-node-native-stacktrace/pull/24 and it doesn't get stuck but this could be because of how the locking has been changed.

I think we'll get that merged and then when I'm migrating @sentry/node-native over I'll add some more e2e tests to confirm.

timfish avatar Oct 23 '25 16:10 timfish

@timfish i just tried against the latest commit in sentry-javascript-node-native-stacktrace and i get the same output, the event generation gets stuck on return future.get();, the event is not uploaded until CaptureStackTrace gets unstuck and finishes and CaptureStackTrace execution gets stuck together with the thread

[20:57:26] Simulating event loop block in main process for 10 seconds... ------------------------> here the main thread gets stuck
[20:57:26.905] [native] Entered GetThreadsLastSeen
[20:57:26.906] [native] Entered GetUnbiasedMonotonicTime
[20:57:26.906] [native] GetUnbiasedMonotonicTime: Windows branch
[20:57:26.907] [native] GetThreadsLastSeen: locking threads
[20:57:26.908] [native] GetThreadsLastSeen: thread last seen
[20:57:26.909] [native] GetThreadsLastSeen: returning result
[20:57:27.418] [native] Entered GetThreadsLastSeen
[20:57:27.418] [native] Entered GetUnbiasedMonotonicTime
[20:57:27.419] [native] GetUnbiasedMonotonicTime: Windows branch
[20:57:27.419] [native] GetThreadsLastSeen: locking threads
[20:57:27.420] [native] GetThreadsLastSeen: thread last seen
[20:57:27.421] [native] GetThreadsLastSeen: returning result
[20:57:27.422] [native] Entered CaptureStackTraces
[20:57:27.423] [native] CaptureStackTraces: locking threads
[20:57:27.423] [native] CaptureStackTraces: launching async for thread
[20:57:27.424] [native] CaptureStackTraces: waiting for thread future
[20:57:27.424] [native] CaptureStackTraces: async thread lambda
[20:57:27.426] [native] Entered CaptureStackTrace -------------------------------> CaptureStackTrace, blocked thread was detected
[20:57:27.426] [native] CaptureStackTrace: requesting interrupt
[20:57:27.427] [native] CaptureStackTrace: waiting for future -----------------------------> it got stock
[20:57:35.777] [native] Entered ExecutionInterrupted -------------------------------------> unstuck after a bit less than 10 seconds
[20:57:35.778] [native] ExecutionInterrupted: capturing frames
[20:57:35.778] [native] Entered GetStackFrames
[20:57:35.779] [native] GetStackFrames: stack not empty
[20:57:35.779] [native] GetStackFrames: frame
[20:57:35.780] [native] GetStackFrames: frame
[20:57:35.781] [native] GetStackFrames: frame
[20:57:35.781] [native] GetStackFrames: unnamed frame
[20:57:35.782] [native] GetStackFrames: frame
[20:57:35.782] [native] GetStackFrames: unnamed frame
[20:57:35.783] [native] GetStackFrames: frame
[20:57:35.783] [native] GetStackFrames: frame
[20:57:35.784] [native] GetStackFrames: done
[20:57:35.784] [native] ExecutionInterrupted: done
[20:57:35.785] [native] CaptureStackTraces: processing thread result
[20:57:35.785] [20:57:35] Event loop block completed -------------------------------------> the main thread gets unstuck
[native] CaptureStackTraces: frame object
JsStackTrace
CaptureStackTrace(Isolate *isolate,
                  const std::optional<AsyncLocalStorageLookup> &store) {
  LOG("[native] Entered CaptureStackTrace");
  if (isolate->IsExecutionTerminating()) {
    LOG("[native] CaptureStackTrace: terminating");
    return JsStackTrace{{}, ""};
  }
  std::promise<JsStackTrace> promise;
  auto future = promise.get_future();
  LOG("[native] CaptureStackTrace: requesting interrupt");
  isolate->RequestInterrupt(ExecutionInterrupted,
                            new InterruptArgs{std::move(promise), &store});
  LOG("[native] CaptureStackTrace: waiting for future");
  return future.get();
}

nikitakot avatar Oct 23 '25 19:10 nikitakot

here's the gist - https://gist.github.com/nikitakot/f8e48073491723117309b2c048650672, the same as the previous one pretty much

and node_modules\@sentry-internal\node-native-stacktrace\module.cc with logs

I'm on node v22.21.0, Windows 11

$ npm i
$ npx electron-rebuild --force
$ npm start
#include <chrono>
#include <future>
#include <iostream>
#include <mutex>
#include <node.h>
#include <node_version.h>
#include <optional>
#include <vector>
#include <iomanip> // for std::put_time
#include <ctime>   // for std::localtime

// Helper to get current time as string
inline std::string CurrentTimeString() {
  auto now = std::chrono::system_clock::now();
  std::time_t now_c = std::chrono::system_clock::to_time_t(now);
  std::tm tm_now;
#ifdef _WIN32
  localtime_s(&tm_now, &now_c);
#else
  localtime_r(&now_c, &tm_now);
#endif
  char buf[32];
  std::strftime(buf, sizeof(buf), "%H:%M:%S", &tm_now);
  auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()) % 1000;
  char out[40];
  std::snprintf(out, sizeof(out), "[%s.%03lld] ", buf, static_cast<long long>(ms.count()));
  return std::string(out);
}
#define LOG(x) std::cout << CurrentTimeString() << x << std::endl

// Platform-specific includes for time functions
#ifdef _WIN32
#include <windows.h>
#elif defined(__APPLE__)
#include <time.h>
#elif defined(__linux__)
#include <time.h>
#endif

#ifndef NODE_MAJOR_VERSION
#error "NODE_MAJOR_VERSION is not defined"
#endif

#define SUPPORTS_ASYNC_CONTEXT_FRAME NODE_MAJOR_VERSION >= 22
#define GET_CONTINUATION_PRESERVED_EMBEDDER_DATA_V2 V8_MAJOR_VERSION >= 14

using namespace v8;
using namespace node;
using namespace std::chrono;

static const int kMaxStackFrames = 50;

struct AsyncLocalStorageLookup {
  // Async local storage instance associated with this thread
  v8::Global<v8::Value> async_local_storage;
  // Optional ordered array of keys (string | symbol) to traverse nested
  // Map/Object structures to fetch the final state object
  std::optional<std::vector<v8::Global<v8::Value>>> storage_keys;
};

// Structure to hold information for each thread/isolate
struct ThreadInfo {
  // Thread name
  std::string thread_name;
  // Last time this thread was seen in milliseconds since epoch
  milliseconds last_seen;
  // Optional async local storage associated with this thread
  std::optional<AsyncLocalStorageLookup> async_store;
  // Some JSON serialized state sent via threadPoll
  std::string poll_state;
};

static std::mutex threads_mutex;
// Map to hold all registered threads and their information
static std::unordered_map<v8::Isolate *, ThreadInfo> threads = {};

// Structure to hold stack frame information
struct JsStackFrame {
  std::string function_name;
  std::string filename;
  int lineno;
  int colno;
};

// Type alias for a vector of JsStackFrame
using JsStackFrames = std::vector<JsStackFrame>;

struct JsStackTrace {
  // The frames in the stack trace
  JsStackFrames frames;
  // JSON serialized string of the async state
  std::string async_state;
};

struct ThreadResult {
  std::string thread_name;
  JsStackTrace stack_trace;
  // JSON serialized string of the poll state
  std::string poll_state;
};

// Recursively sanitize a value to be safely JSON-stringifiable by:
// - Removing properties whose values are BigInt, Function, or Symbol
//   (dropped for objects, omitted from arrays)
// - Breaking cycles by omitting repeated objects (undefined -> dropped/omitted)
// - Preserving primitives and traversing arrays/objects
static v8::Local<v8::Value>
SanitizeForJSON(v8::Isolate *isolate, v8::Local<v8::Context> context,
                v8::Local<v8::Value> value,
                std::vector<v8::Local<v8::Object>> &ancestors) {
  LOG("[native] Entered SanitizeForJSON");
  if (value->IsNull() || value->IsBoolean() || value->IsNumber() ||
      value->IsString()) {
    LOG("[native] SanitizeForJSON: primitive fast-path");
    return value;
  }
  if (value->IsBigInt() || value->IsSymbol() || value->IsFunction() ||
      value->IsUndefined()) {
    LOG("[native] SanitizeForJSON: non-JSON type");
    return v8::Undefined(isolate);
  }
  if (value->IsArray()) {
    LOG("[native] SanitizeForJSON: array branch");
    auto arr = value.As<v8::Array>();
    auto arr_obj = value.As<v8::Object>();
    for (auto &a : ancestors) {
      if (a->StrictEquals(arr_obj)) {
        LOG("[native] SanitizeForJSON: array cycle detected");
        return v8::Undefined(isolate);
      }
    }
    auto length = arr->Length();
    auto out = v8::Array::New(isolate, 0);
    ancestors.push_back(arr_obj);
    uint32_t out_index = 0;
    for (uint32_t i = 0; i < length; ++i) {
      LOG("[native] SanitizeForJSON: array element");
      auto maybeEl = arr->Get(context, i);
      v8::Local<v8::Value> el;
      if (!maybeEl.ToLocal(&el)) {
        LOG("[native] SanitizeForJSON: array element empty");
        el = v8::Undefined(isolate);
      }
      auto sanitized = SanitizeForJSON(isolate, context, el, ancestors);
      if (!sanitized->IsUndefined()) {
        LOG("[native] SanitizeForJSON: array element sanitized");
        out->Set(context, out_index++, sanitized).Check();
      }
    }
    ancestors.pop_back();
    LOG("[native] SanitizeForJSON: array done");
    return out;
  }
  if (value->IsObject()) {
    LOG("[native] SanitizeForJSON: object branch");
    auto obj = value.As<v8::Object>();
    for (auto &a : ancestors) {
      if (a->StrictEquals(obj)) {
        LOG("[native] SanitizeForJSON: object cycle detected");
        return v8::Undefined(isolate);
      }
    }
    ancestors.push_back(obj);
    auto maybe_props = obj->GetPropertyNames(context);
    if (maybe_props.IsEmpty()) {
      LOG("[native] SanitizeForJSON: no enumerable props");
      ancestors.pop_back();
      return obj;
    }
    auto props = maybe_props.ToLocalChecked();
    auto out = v8::Object::New(isolate);
    auto len = props->Length();
    for (uint32_t i = 0; i < len; ++i) {
      LOG("[native] SanitizeForJSON: object property");
      auto maybeKey = props->Get(context, i);
      if (maybeKey.IsEmpty()) {
        LOG("[native] SanitizeForJSON: property key empty");
        continue;
      }
      auto key = maybeKey.ToLocalChecked();
      if (!key->IsString()) {
        LOG("[native] SanitizeForJSON: non-string key skipped");
        continue;
      }
      auto maybeVal = obj->Get(context, key);
      if (maybeVal.IsEmpty()) {
        LOG("[native] SanitizeForJSON: property value empty");
        continue;
      }
      auto val = maybeVal.ToLocalChecked();
      auto sanitized = SanitizeForJSON(isolate, context, val, ancestors);
      if (!sanitized->IsUndefined()) {
        LOG("[native] SanitizeForJSON: property sanitized");
        out->Set(context, key, sanitized).Check();
      }
    }
    ancestors.pop_back();
    LOG("[native] SanitizeForJSON: object done");
    return out;
  }
  LOG("[native] SanitizeForJSON: fallback branch");
  return value;
}

std::string JSONStringify(Isolate *isolate, Local<Value> value) {
  LOG("[native] Entered JSONStringify");
  auto context = isolate->GetCurrentContext();
  LOG("[native] JSONStringify: sanitizing value");
  std::vector<v8::Local<v8::Object>> ancestors;
  auto sanitized = SanitizeForJSON(isolate, context, value, ancestors);
  if (sanitized->IsUndefined()) {
    LOG("[native] JSONStringify: nothing serializable");
    return "";
  }
  auto maybe_json = v8::JSON::Stringify(context, sanitized);
  if (maybe_json.IsEmpty()) {
    LOG("[native] JSONStringify: stringify failed");
    return "";
  }
  v8::String::Utf8Value utf8(isolate, maybe_json.ToLocalChecked());
  LOG("[native] JSONStringify: success");
  return *utf8 ? *utf8 : "";
}

// Function to get stack frames from a V8 stack trace
JsStackFrames GetStackFrames(Isolate *isolate) {
  LOG("[native] Entered GetStackFrames");
  auto stack = StackTrace::CurrentStackTrace(isolate, kMaxStackFrames,
                                             StackTrace::kDetailed);
  JsStackFrames frames;
  if (!stack.IsEmpty()) {
    LOG("[native] GetStackFrames: stack not empty");
    for (int i = 0; i < stack->GetFrameCount(); i++) {
      LOG("[native] GetStackFrames: frame");
      auto frame = stack->GetFrame(isolate, i);
      auto fn_name = frame->GetFunctionName();
      std::string function_name;
      if (frame->IsEval()) {
        LOG("[native] GetStackFrames: eval frame");
        function_name = "[eval]";
      } else if (fn_name.IsEmpty() || fn_name->Length() == 0) {
        LOG("[native] GetStackFrames: unnamed frame");
        function_name = "?";
      } else if (frame->IsConstructor()) {
        LOG("[native] GetStackFrames: constructor frame");
        function_name = "[constructor]";
      } else {
        v8::String::Utf8Value utf8_fn(isolate, fn_name);
        function_name = *utf8_fn ? *utf8_fn : "?";
      }
      std::string filename;
      auto script_name = frame->GetScriptName();
      if (!script_name.IsEmpty()) {
        v8::String::Utf8Value utf8_filename(isolate, script_name);
        filename = *utf8_filename ? *utf8_filename : "<unknown>";
      } else {
        filename = "<unknown>";
      }
      int lineno = frame->GetLineNumber();
      int colno = frame->GetColumn();
      frames.push_back(JsStackFrame{function_name, filename, lineno, colno});
    }
  } else {
    LOG("[native] GetStackFrames: stack empty");
  }
  LOG("[native] GetStackFrames: done");
  return frames;
}

#if SUPPORTS_ASYNC_CONTEXT_FRAME
// Function to fetch the thread state from the async context store
std::string GetThreadState(Isolate *isolate,
                           const AsyncLocalStorageLookup &store) {
  LOG("[native] Entered GetThreadState");
#if GET_CONTINUATION_PRESERVED_EMBEDDER_DATA_V2
  LOG("[native] GetThreadState: using V2 embedder data");
  auto data = isolate->GetContinuationPreservedEmbedderDataV2().As<Value>();
#else
  LOG("[native] GetThreadState: using legacy embedder data");
  auto data = isolate->GetContinuationPreservedEmbedderData();
#endif
  auto async_local_storage_local = store.async_local_storage.Get(isolate);
  if (data.IsEmpty() || !data->IsMap() || async_local_storage_local.IsEmpty()) {
    LOG("[native] GetThreadState: missing map or storage");
    return "";
  }
  auto map = data.As<v8::Map>();
  auto context = isolate->GetCurrentContext();
  auto maybe_root_store = map->Get(context, async_local_storage_local);
  if (maybe_root_store.IsEmpty()) {
    LOG("[native] GetThreadState: root store empty");
    return "";
  }
  auto root_store = maybe_root_store.ToLocalChecked();
  if (store.storage_keys.has_value()) {
    LOG("[native] GetThreadState: traversing storage keys");
    const auto &keys = store.storage_keys.value();
    auto current = root_store;
    for (auto &gkey : keys) {
      auto local_key = gkey.Get(isolate);
      if (!(local_key->IsString() || local_key->IsSymbol())) {
        LOG("[native] GetThreadState: key not string/symbol");
        continue;
      }
      v8::MaybeLocal<v8::Value> maybeValue;
      if (current->IsMap()) {
        LOG("[native] GetThreadState: current is map");
        auto map_val = current.As<v8::Map>();
        maybeValue = map_val->Get(context, local_key);
      } else if (current->IsObject()) {
        LOG("[native] GetThreadState: current is object");
        auto obj_val = current.As<v8::Object>();
        maybeValue = obj_val->Get(context, local_key);
      } else {
        LOG("[native] GetThreadState: current not map/object");
        return "";
      }
      if (maybeValue.IsEmpty()) {
        LOG("[native] GetThreadState: value empty");
        return "";
      }
      current = maybeValue.ToLocalChecked();
    }
    root_store = current;
  }
  LOG("[native] GetThreadState: returning JSON");
  return JSONStringify(isolate, root_store);
}
#endif

struct InterruptArgs {
  std::promise<JsStackTrace> promise;
  const std::optional<AsyncLocalStorageLookup> *store;
};

// Function to be called when an isolate's execution is interrupted
static void ExecutionInterrupted(Isolate *isolate, void *data) {
  LOG("[native] Entered ExecutionInterrupted");
  auto args = static_cast<InterruptArgs *>(data);
  Locker locker(isolate);
  HandleScope handle_scope(isolate);
  if (isolate->IsExecutionTerminating()) {
    LOG("[native] ExecutionInterrupted: terminating");
    args->promise.set_value({{}, ""});
    delete args;
    return;
  }
  LOG("[native] ExecutionInterrupted: capturing frames");
  auto frames = GetStackFrames(isolate);
  std::string state = "";
#if SUPPORTS_ASYNC_CONTEXT_FRAME
  if (args->store && args->store->has_value()) {
    LOG("[native] ExecutionInterrupted: capturing async state");
    state = GetThreadState(isolate, args->store->value());
  }
#endif
  args->promise.set_value({frames, state});
  LOG("[native] ExecutionInterrupted: done");
  delete args;
}

// Function to capture the stack trace of a single isolate
JsStackTrace
CaptureStackTrace(Isolate *isolate,
                  const std::optional<AsyncLocalStorageLookup> &store) {
  LOG("[native] Entered CaptureStackTrace");
  if (isolate->IsExecutionTerminating()) {
    LOG("[native] CaptureStackTrace: terminating");
    return JsStackTrace{{}, ""};
  }
  std::promise<JsStackTrace> promise;
  auto future = promise.get_future();
  LOG("[native] CaptureStackTrace: requesting interrupt");
  isolate->RequestInterrupt(ExecutionInterrupted,
                            new InterruptArgs{std::move(promise), &store});
  LOG("[native] CaptureStackTrace: waiting for future");
  return future.get();
}

// Function to capture stack traces from all registered threads
void CaptureStackTraces(const FunctionCallbackInfo<Value> &args) {
  LOG("[native] Entered CaptureStackTraces");
  auto capture_from_isolate = args.GetIsolate();
  std::vector<ThreadResult> results;
  {
    LOG("[native] CaptureStackTraces: locking threads");
    std::vector<std::future<ThreadResult>> futures;
    std::lock_guard<std::mutex> lock(threads_mutex);
    for (auto &thread : threads) {
      auto thread_isolate = thread.first;
      auto &thread_info = thread.second;
      if (thread_isolate == capture_from_isolate)
        continue;
      LOG("[native] CaptureStackTraces: launching async for thread");
      auto thread_name = thread_info.thread_name;
      auto poll_state = thread_info.poll_state;
      futures.emplace_back(std::async(
          std::launch::async,
          [thread_isolate, thread_name, poll_state](
              const std::optional<AsyncLocalStorageLookup> &async_store)
              -> ThreadResult {
            LOG("[native] CaptureStackTraces: async thread lambda");
            return ThreadResult{thread_name,
                                CaptureStackTrace(thread_isolate, async_store),
                                poll_state};
          },
          std::cref(thread_info.async_store)));
    }
    for (auto &fut : futures) {
      LOG("[native] CaptureStackTraces: waiting for thread future");
      results.emplace_back(fut.get());
    }
  }
  auto current_context = capture_from_isolate->GetCurrentContext();
  Local<Object> output = Object::New(capture_from_isolate);
  for (auto &result : results) {
    LOG("[native] CaptureStackTraces: processing thread result");
    auto key =
        String::NewFromUtf8(capture_from_isolate, result.thread_name.c_str(),
                            NewStringType::kNormal)
            .ToLocalChecked();
    Local<Array> jsFrames =
        Array::New(capture_from_isolate, result.stack_trace.frames.size());
    for (size_t i = 0; i < result.stack_trace.frames.size(); ++i) {
      LOG("[native] CaptureStackTraces: frame object");
      const auto &frame = result.stack_trace.frames[i];
      Local<Object> frameObj = Object::New(capture_from_isolate);
      frameObj
          ->Set(current_context,
                String::NewFromUtf8(capture_from_isolate, "function",
                                    NewStringType::kInternalized)
                    .ToLocalChecked(),
                String::NewFromUtf8(capture_from_isolate,
                                    frame.function_name.c_str(),
                                    NewStringType::kNormal)
                    .ToLocalChecked())
          .Check();
      frameObj
          ->Set(current_context,
                String::NewFromUtf8(capture_from_isolate, "filename",
                                    NewStringType::kInternalized)
                    .ToLocalChecked(),
                String::NewFromUtf8(capture_from_isolate,
                                    frame.filename.c_str(),
                                    NewStringType::kNormal)
                    .ToLocalChecked())
          .Check();
      frameObj
          ->Set(current_context,
                String::NewFromUtf8(capture_from_isolate, "lineno",
                                    NewStringType::kInternalized)
                    .ToLocalChecked(),
                Integer::New(capture_from_isolate, frame.lineno))
          .Check();
      frameObj
          ->Set(current_context,
                String::NewFromUtf8(capture_from_isolate, "colno",
                                    NewStringType::kInternalized)
                    .ToLocalChecked(),
                Integer::New(capture_from_isolate, frame.colno))
          .Check();
      jsFrames->Set(current_context, static_cast<uint32_t>(i), frameObj)
          .Check();
    }
    Local<Object> threadObj = Object::New(capture_from_isolate);
    threadObj
        ->Set(current_context,
              String::NewFromUtf8(capture_from_isolate, "frames",
                                  NewStringType::kInternalized)
                  .ToLocalChecked(),
              jsFrames)
        .Check();
    if (!result.poll_state.empty()) {
      LOG("[native] CaptureStackTraces: pollState present");
      v8::MaybeLocal<v8::String> stateStr = v8::String::NewFromUtf8(
          capture_from_isolate, result.poll_state.c_str(),
          NewStringType::kNormal);
      if (!stateStr.IsEmpty()) {
        v8::MaybeLocal<v8::Value> maybeStateVal =
            v8::JSON::Parse(current_context, stateStr.ToLocalChecked());
        v8::Local<v8::Value> stateVal;
        if (maybeStateVal.ToLocal(&stateVal)) {
          threadObj
              ->Set(current_context,
                    String::NewFromUtf8(capture_from_isolate, "pollState",
                                        NewStringType::kInternalized)
                        .ToLocalChecked(),
                    stateVal)
              .Check();
        }
      }
    }
    if (!result.stack_trace.async_state.empty()) {
      LOG("[native] CaptureStackTraces: asyncState present");
      v8::MaybeLocal<v8::String> stateStr = v8::String::NewFromUtf8(
          capture_from_isolate, result.stack_trace.async_state.c_str(),
          NewStringType::kNormal);
      if (!stateStr.IsEmpty()) {
        v8::MaybeLocal<v8::Value> maybeStateVal =
            v8::JSON::Parse(current_context, stateStr.ToLocalChecked());
        v8::Local<v8::Value> stateVal;
        if (maybeStateVal.ToLocal(&stateVal)) {
          threadObj
              ->Set(current_context,
                    String::NewFromUtf8(capture_from_isolate, "asyncState",
                                        NewStringType::kInternalized)
                        .ToLocalChecked(),
                    stateVal)
              .Check();
        }
      }
    }
    output->Set(current_context, key, threadObj).Check();
  }
  LOG("[native] CaptureStackTraces: returning output");
  args.GetReturnValue().Set(output);
}

// Cleanup function to remove the thread from the map when the isolate is
// destroyed
void Cleanup(void *arg) {
  LOG("[native] Entered Cleanup");
  auto isolate = static_cast<Isolate *>(arg);
  LOG("[native] Cleanup: locking threads");
  std::lock_guard<std::mutex> lock(threads_mutex);
  LOG("[native] Cleanup: erasing isolate");
  threads.erase(isolate);
  LOG("[native] Cleanup: done");
}

void RegisterThreadInternal(
    Isolate *isolate, const std::string &thread_name,
    std::optional<AsyncLocalStorageLookup> async_store) {
  LOG("[native] Entered RegisterThreadInternal");
  std::lock_guard<std::mutex> lock(threads_mutex);
  auto found = threads.find(isolate);
  if (found == threads.end()) {
    LOG("[native] RegisterThreadInternal: emplacing thread");
    threads.emplace(isolate, ThreadInfo{thread_name, milliseconds::zero(),
                                        std::move(async_store), ""});
    node::AddEnvironmentCleanupHook(isolate, Cleanup, isolate);
    LOG("[native] RegisterThreadInternal: cleanup hook registered");
  } else {
    LOG("[native] RegisterThreadInternal: thread already registered");
  }
}

void RegisterThread(const FunctionCallbackInfo<Value> &args) {
  LOG("[native] Entered RegisterThread");
  auto isolate = args.GetIsolate();
  auto context = isolate->GetCurrentContext();
  if (args.Length() == 1 && args[0]->IsString()) {
    LOG("[native] RegisterThread: string arg");
    v8::String::Utf8Value utf8(isolate, args[0]);
    std::string thread_name(*utf8 ? *utf8 : "");
    RegisterThreadInternal(isolate, thread_name, std::nullopt);
  } else if (args.Length() == 2 && args[0]->IsObject() && args[1]->IsString()) {
    LOG("[native] RegisterThread: object + string args");
    v8::String::Utf8Value utf8(isolate, args[1]);
    std::string thread_name(*utf8 ? *utf8 : "");
    auto obj = args[0].As<Object>();
    auto async_local_storage_val =
        obj->Get(context, String::NewFromUtf8(isolate, "asyncLocalStorage",
                                              NewStringType::kInternalized)
                              .ToLocalChecked());
    if (async_local_storage_val.IsEmpty() ||
        !async_local_storage_val.ToLocalChecked()->IsObject()) {
      LOG("[native] RegisterThread: missing asyncLocalStorage");
      isolate->ThrowException(Exception::Error(
          String::NewFromUtf8(isolate,
                              "The first argument must be an object with an "
                              "asyncLocalStorage property",
                              NewStringType::kInternalized)
              .ToLocalChecked()));
      return;
    }
    std::optional<std::vector<v8::Global<v8::Value>>> storage_keys =
        std::nullopt;
    auto storage_key_val =
        obj->Get(context, String::NewFromUtf8(isolate, "stateLookup",
                                              NewStringType::kInternalized)
                              .ToLocalChecked());
    if (!storage_key_val.IsEmpty()) {
      LOG("[native] RegisterThread: stateLookup present");
      auto local_val = storage_key_val.ToLocalChecked();
      if (!local_val->IsUndefined() && !local_val->IsNull()) {
        if (local_val->IsArray()) {
          LOG("[native] RegisterThread: stateLookup is array");
          auto arr = local_val.As<v8::Array>();
          std::vector<v8::Global<v8::Value>> keys_vec;
          uint32_t length = arr->Length();
          for (uint32_t i = 0; i < length; ++i) {
            auto maybeEl = arr->Get(context, i);
            if (maybeEl.IsEmpty())
              continue;
            auto el = maybeEl.ToLocalChecked();
            if (el->IsString() || el->IsSymbol()) {
              LOG("[native] RegisterThread: stateLookup key valid");
              keys_vec.emplace_back(isolate, el);
            }
          }
          if (!keys_vec.empty()) {
            LOG("[native] RegisterThread: keys_vec not empty");
            storage_keys = std::move(keys_vec);
          }
        }
      }
    }
    auto store = AsyncLocalStorageLookup{
        v8::Global<v8::Value>(isolate,
                              async_local_storage_val.ToLocalChecked()),
        std::move(storage_keys)};
    RegisterThreadInternal(isolate, thread_name, std::move(store));
  } else {
    LOG("[native] RegisterThread: incorrect arguments");
    isolate->ThrowException(Exception::Error(
        String::NewFromUtf8(
            isolate,
            "Incorrect arguments. Expected: \n"
            "- registerThread(threadName: string) or \n"
            "- registerThread(storage: {asyncLocalStorage: AsyncLocalStorage; "
            "stateLookup?: Array<string | symbol>}, "
            "threadName: string)",
            NewStringType::kInternalized)
            .ToLocalChecked()));
  }
}

steady_clock::time_point GetUnbiasedMonotonicTime() {
  LOG("[native] Entered GetUnbiasedMonotonicTime");
#ifdef _WIN32
  LOG("[native] GetUnbiasedMonotonicTime: Windows branch");
  ULONGLONG interrupt_time;
  QueryUnbiasedInterruptTimePrecise(&interrupt_time);
  uint64_t time_ns = interrupt_time * 100;
  return steady_clock::time_point(nanoseconds(time_ns));
#elif defined(__APPLE__)
  LOG("[native] GetUnbiasedMonotonicTime: Apple branch");
  uint64_t time_ns = clock_gettime_nsec_np(CLOCK_UPTIME_RAW);
  return steady_clock::time_point(nanoseconds(time_ns));
#elif defined(__linux__)
  LOG("[native] GetUnbiasedMonotonicTime: Linux branch");
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);
  return steady_clock::time_point(seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec));
#else
  LOG("[native] GetUnbiasedMonotonicTime: fallback branch");
  return steady_clock::now();
#endif
}

void ThreadPoll(const FunctionCallbackInfo<Value> &args) {
  LOG("[native] Entered ThreadPoll");
  auto isolate = args.GetIsolate();
  bool enable_last_seen = true;
  if (args.Length() > 0 && args[0]->IsBoolean()) {
    LOG("[native] ThreadPoll: enable_last_seen arg");
    enable_last_seen = args[0]->BooleanValue(isolate);
  }
  std::string poll_state = "";
  if (args.Length() > 1 && args[1]->IsObject()) {
    LOG("[native] ThreadPoll: poll_state arg");
    auto obj = args[1].As<Object>();
    poll_state = JSONStringify(isolate, obj);
  }
  {
    LOG("[native] ThreadPoll: locking threads");
    std::lock_guard<std::mutex> lock(threads_mutex);
    auto found = threads.find(isolate);
    if (found != threads.end()) {
      LOG("[native] ThreadPoll: found thread");
      auto &thread_info = found->second;
      thread_info.poll_state = std::move(poll_state);
      if (enable_last_seen) {
        LOG("[native] ThreadPoll: updating last_seen");
        thread_info.last_seen = duration_cast<milliseconds>(
            GetUnbiasedMonotonicTime().time_since_epoch());
      } else {
        LOG("[native] ThreadPoll: disabling last_seen");
        thread_info.last_seen = milliseconds::zero();
      }
    } else {
      LOG("[native] ThreadPoll: thread not found");
    }
  }
}

void GetThreadsLastSeen(const FunctionCallbackInfo<Value> &args) {
  LOG("[native] Entered GetThreadsLastSeen");
  Isolate *isolate = args.GetIsolate();
  Local<Object> result = Object::New(isolate);
  milliseconds now = duration_cast<milliseconds>(
      GetUnbiasedMonotonicTime().time_since_epoch());
  {
    LOG("[native] GetThreadsLastSeen: locking threads");
    std::lock_guard<std::mutex> lock(threads_mutex);
    for (const auto &[thread_isolate, info] : threads) {
      if (info.last_seen == milliseconds::zero()) {
        LOG("[native] GetThreadsLastSeen: skipping thread");
        continue;
      }
      int64_t ms_since = (now - info.last_seen).count();
      LOG("[native] GetThreadsLastSeen: thread last seen");
      result
          ->Set(isolate->GetCurrentContext(),
                String::NewFromUtf8(isolate, info.thread_name.c_str(),
                                    NewStringType::kNormal)
                    .ToLocalChecked(),
                Number::New(isolate, ms_since))
          .Check();
    }
  }
  LOG("[native] GetThreadsLastSeen: returning result");
  args.GetReturnValue().Set(result);
}

extern "C" NODE_MODULE_EXPORT void
NODE_MODULE_INITIALIZER(Local<Object> exports, Local<Value> module,
                        Local<Context> context) {
  LOG("[native] Entered NODE_MODULE_INITIALIZER");
  auto isolate = v8::Isolate::GetCurrent();
  LOG("[native] NODE_MODULE_INITIALIZER: setting exports");
  exports
      ->Set(context,
            String::NewFromUtf8(isolate, "captureStackTrace",
                                NewStringType::kInternalized)
                .ToLocalChecked(),
            FunctionTemplate::New(isolate, CaptureStackTraces)
                ->GetFunction(context)
                .ToLocalChecked())
      .Check();
  exports
      ->Set(context,
            String::NewFromUtf8(isolate, "registerThread",
                                NewStringType::kInternalized)
                .ToLocalChecked(),
            FunctionTemplate::New(isolate, RegisterThread)
                ->GetFunction(context)
                .ToLocalChecked())
      .Check();
  exports
      ->Set(context,
            String::NewFromUtf8(isolate, "threadPoll",
                                NewStringType::kInternalized)
                .ToLocalChecked(),
            FunctionTemplate::New(isolate, ThreadPoll)
                ->GetFunction(context)
                .ToLocalChecked())
      .Check();
  exports
      ->Set(context,
            String::NewFromUtf8(isolate, "getThreadsLastSeen",
                                NewStringType::kInternalized)
                .ToLocalChecked(),
            FunctionTemplate::New(isolate, GetThreadsLastSeen)
                ->GetFunction(context)
                .ToLocalChecked())
      .Check();
  LOG("[native] NODE_MODULE_INITIALIZER: done");
}

nikitakot avatar Oct 23 '25 19:10 nikitakot

i think it's simply impossible to collect the trace for the indefinitely stuck js thread, "interrupt" won't happen until the main js thread gets unblocked, i did another simple test, but i think this topic can be closed and documentation maybe should be also changed stating that traces for the infinite even loop blocks wont' be generated

(this is a 30 seconds block)

[21:43:26.162] [native] Entered CaptureStackTrace
[21:43:26.162] [native] CaptureStackTrace: requesting interrupt
[21:43:26.163] [native] CaptureStackTrace: waiting for future
[21:43:28.165] [native] CaptureStackTrace: isolate blocked for 2 seconds
[21:43:30.176] [native] CaptureStackTrace: isolate blocked for 4 seconds
[21:43:32.181] [native] CaptureStackTrace: isolate blocked for 6 seconds
[21:43:34.190] [native] CaptureStackTrace: isolate blocked for 8 seconds
[21:43:36.199] [native] CaptureStackTrace: isolate blocked for 10 seconds
[21:43:38.208] [native] CaptureStackTrace: isolate blocked for 12 seconds
[21:43:40.223] [native] CaptureStackTrace: isolate blocked for 14 seconds
[21:43:42.241] [native] CaptureStackTrace: isolate blocked for 16 seconds
[21:43:44.243] [native] CaptureStackTrace: isolate blocked for 18 seconds
[21:43:46.246] [native] CaptureStackTrace: isolate blocked for 20 seconds
[21:43:48.251] [native] CaptureStackTrace: isolate blocked for 22 seconds
[21:43:50.266] [native] CaptureStackTrace: isolate blocked for 24 seconds
[21:43:52.273] [native] CaptureStackTrace: isolate blocked for 26 seconds
[21:43:54.277] [native] CaptureStackTrace: isolate blocked for 28 seconds
[21:43:55.480] [native] Entered ExecutionInterrupted
[21:43:55.481] [native] ExecutionInterrupted: capturing frames
[21:43:55.481] [native] Entered GetStackFrames
...
 LOG("[native] CaptureStackTrace: waiting for future");
  
  // Track blocking time and log every 2 seconds
  auto start_time = steady_clock::now();
  const auto timeout_duration = seconds(2);
  
  while (future.wait_for(timeout_duration) == std::future_status::timeout) {
    auto elapsed = duration_cast<seconds>(steady_clock::now() - start_time);
    LOG("[native] CaptureStackTrace: isolate blocked for " << elapsed.count() << " seconds");
  }
  
  auto total_elapsed = duration_cast<milliseconds>(steady_clock::now() - start_time);
  if (total_elapsed.count() > 2000) {
    LOG("[native] CaptureStackTrace: isolate was blocked for total of " << total_elapsed.count() << " ms");
  }
  
  return future.get();

This captures events in CI on all three platforms: https://github.com/getsentry/sentry-electron/pull/1253/files#diff-36e6cd764896407f6f6836936bae915a737ff555a63c8af3a7987ac6f3f06654R24

this test you sent before works because spawnSync('ping localhost', { shell: true, timeout: 10000 }); doesn't block the main thread for 10s, it only blocks it for couple seconds

[21:46:33.820] [native] Entered CaptureStackTraces
[21:46:33.821] [native] CaptureStackTraces: locking threads
[21:46:33.822] [native] CaptureStackTraces: launching async for thread
[21:46:33.823] [native] CaptureStackTraces: waiting for thread future
[21:46:33.823] [native] CaptureStackTraces: async thread lambda
[21:46:33.824] [native] Entered CaptureStackTrace
[21:46:33.824] [native] CaptureStackTrace: requesting interrupt
[21:46:33.825] [native] CaptureStackTrace: waiting for future
[21:46:35.827] [native] CaptureStackTrace: isolate blocked for 2 seconds
[21:46:36.189] [native] Entered ExecutionInterrupted
[21:46:36.189] [native] ExecutionInterrupted: capturing frames
[21:46:36.190] [native] Entered GetStackFrames
[21:46:36.190] [native] GetStackFrames: stack not empty
[21:46:36.191] [native] GetStackFrames: frame
[21:46:36.191] [native] GetStackFrames: frame
[21:46:36.191] [native] GetStackFrames: unnamed frame
[21:46:36.192] [native] GetStackFrames: frame
[21:46:36.192] [native] GetStackFrames: frame
[21:46:36.193] [native] GetStackFrames: done
[21:46:36.193] [native] ExecutionInterrupted: done
[21:46:36.193] [native] CaptureStackTrace: isolate was blocked for total of [21:46:36] Event loop block completed
2367 ms

nikitakot avatar Oct 23 '25 19:10 nikitakot

It looks like we can detect blocking by cpu intensive synchronous work like in our tests because JavaScript is still executing. With spawnSync, it blocks the event loop in a way that it won't interrupt.

documentation maybe should be also changed stating that traces for the infinite even loop blocks wont' be generated

Yes this should be updated!

I wonder what other APIs this might impact 🤔. I would assume Node sync file APIs would never block indefinitely.

timfish avatar Oct 23 '25 20:10 timfish

@timfish if i do spawnSync("ping -n 30 -w 1000 localhost > nul", { shell: true }); it gets stuck the same as my example with execSync, afaik execSync is just a wrapper around spawnSync

native modules are also blocking threads the same, fox ex. I implemented a simple native module which does this:

...
if (duration < 0)
            {
                Napi::RangeError::New(env, "Duration must be non-negative")
                    .ThrowAsJavaScriptException();
                return env.Null();
            }

            // Block the thread for the specified duration
            std::this_thread::sleep_for(std::chrono::milliseconds(duration));

            return env.Undefined();
...

and the isolate gets stuck the same

nikitakot avatar Oct 24 '25 07:10 nikitakot