node icon indicating copy to clipboard operation
node copied to clipboard

Tracking issue: support of builtins and V8 issues in run-time user-land snapshots

Open joyeecheung opened this issue 1 year ago • 6 comments

This issue is opened to track known bugs and limitations of the run-time user-land snapshot.

Currently known limitations/bugs in Node.js builtins:

  • [ ] Support modules that generate system requests e.g. net, http (without supporting the snapshotting of those requests, which we can't resurrect. But user should be able to snapshot the application after all pending requests are completed).
  • [ ] Certain global event listeners on the process object need to be removed during snapshot serialization, and re-installed during deserialization
  • [ ] Support user-land modules
  • [ ] Integration with SEA (Single Executable Application): https://github.com/nodejs/node/pull/42334
  • [ ] Support of more embedder objects and bindings: https://github.com/nodejs/node/issues/37476
  • [ ] Support diagnostics channels
  • [ ] Support vm contexts
  • [ ] Support workers

Currently known limitations/bugs that are being fixed in the V8 upstream:

  • [ ] V8 hits a DCHECK when deserializing certain mutated globals, e.g. Error.stackTraceLimit (it should work fine in the release build, however): https://bugs.chromium.org/p/v8/issues/detail?id=12772 (fix in https://chromium-review.googlesource.com/c/v8/v8/+/3319481)
  • [x] Layout of V8's read-only heap can be inconsistent after deserialization, resulting in memory corruption: https://github.com/nodejs/node/issues/40832 & https://bugs.chromium.org/p/v8/issues/detail?id=12921 (fixed)

joyeecheung avatar Jul 28 '22 04:07 joyeecheung

Hey, big thanks for work. I've checkout https://github.com/nodejs/node/pull/38905 and applied https://bugs.chromium.org/p/v8/issues/detail?id=12921 patch on top of it. Generally it works good, but I found another segfault. It seems like it's related to GC. I've reproduced it two times.

  1. 50 mb bundle, snapshot created, restored, segfault on running function which fs.readFileSync(this50mbFile, 'utf8')
  2. 150 mb bundle, snapshot created, segfault on restoration (running node --snapshot-blob).
LLDB logs

➜  snapshot git:(master) ✗ lldb -- /home/badim/github/node/out/Debug/node --snapshot-blob snapshot.blob script.js
(lldb) target create "/home/badim/github/node/out/Debug/node"
runCurrent executable set to '/home/badim/github/node/out/Debug/node' (x86_64).
(lldb) settings set -- target.run-args  "--snapshot-blob" "snapshot.blob" "script.js"
(lldb) run
Process 59768 launched: '/home/badim/github/node/out/Debug/node' (x86_64)
Process 59768 stopped
* thread #6, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x559b20665e50)
    frame #0: 0x0000555556e59c40 node`v8::internal::JSArrayBuffer::MarkExtension() [inlined] std::__atomic_base<bool>::store(__m=memory_order_relaxed, __i=true, this=0x0000559b20665e50) at atomic_base.h:404:18
   401 		__glibcxx_assert(__b != memory_order_acq_rel);
   402 		__glibcxx_assert(__b != memory_order_consume);
   403
-> 404 		__atomic_store_n(&_M_i, __i, int(__m));
   405 	      }
   406
   407 	      _GLIBCXX_ALWAYS_INLINE void
(lldb) bt
* thread #6, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x559b20665e50)
  * frame #0: 0x0000555556e59c40 node`v8::internal::JSArrayBuffer::MarkExtension() [inlined] std::__atomic_base<bool>::store(__m=memory_order_relaxed, __i=true, this=0x0000559b20665e50) at atomic_base.h:404:18
    frame #1: 0x0000555556e59c40 node`v8::internal::JSArrayBuffer::MarkExtension() [inlined] std::atomic<bool>::store(__m=memory_order_relaxed, __i=true, this=0x0000559b20665e50) at atomic:104
    frame #2: 0x0000555556e59c40 node`v8::internal::JSArrayBuffer::MarkExtension() [inlined] v8::internal::ArrayBufferExtension::Mark(this=0x0000559b20665e50) at js-array-buffer.h:182
    frame #3: 0x0000555556e59c40 node`v8::internal::JSArrayBuffer::MarkExtension(this=0x00007ffff5a9d930) at js-array-buffer.cc:202
    frame #4: 0x0000555556b22b17 node`v8::internal::HeapVisitor<int, v8::internal::ConcurrentMarkingVisitor>::Visit(v8::internal::Map, v8::internal::HeapObject) [inlined] v8::internal::MarkingVisitorBase<v8::internal::ConcurrentMarkingVisitor, v8::internal::ConcurrentMarkingState>::VisitJSArrayBuffer(object=JSArrayBuffer @ 0x00007ffff5a9d930, map=<unavailable>, this=0x00007ffff5a9dc40) at marking-visitor-inl.h:321:23
    frame #5: 0x0000555556b22b08 node`v8::internal::HeapVisitor<int, v8::internal::ConcurrentMarkingVisitor>::Visit(this=0x00007ffff5a9dc40, map=Map @ rbp, object=<unavailable>) at objects-visiting-inl.h:64
    frame #6: 0x0000555556b2519a node`v8::internal::ConcurrentMarking::Run(this=0x000055555b85e560, delegate=<unavailable>, code_flush_mode=<unavailable>, mark_compact_epoch=<unavailable>, should_keep_ages_unchanged=<unavailable>) at concurrent-marking.cc:536:46
    frame #7: 0x0000555556b25cf1 node`v8::internal::ConcurrentMarking::JobTask::Run(this=0x000055555edad1c0, delegate=0x00007ffff5a9ed70) at concurrent-marking.cc:425:31
    frame #8: 0x0000555557b974e8 node`v8::platform::DefaultJobWorker::Run(this=0x000055555edad250) at default-job.h:147:21
    frame #9: 0x0000555556616f22 node`node::(anonymous namespace)::PlatformWorkerThread(data=0x000055555b4394f0) at node_platform.cc:43:14
    frame #10: 0x00007ffff7c76ea7 libpthread.so.0`start_thread + 215
    frame #11: 0x00007ffff7ba6def libc.so.6`__clone + 63
(lldb)

It looks like it starting GC snapshotted objects and there is a null pointer or smth. --max-old-space-size=12000 --max-semi-space-size=10000 didn't helped. Maybe I need to compile node with GC disabled to check if it's 100% GC. Ping me If I can help with debugging.

goloveychuk avatar Jul 29 '22 14:07 goloveychuk

@goloveychuk Thanks for the report, this is just a guess but can you try it with https://chromium-review.googlesource.com/c/v8/v8/+/3793525 ? If it shows up again, can you try building Node.js with the --v8-non-optimized-debug configure option and see if you can have a more detailed stack trace?

joyeecheung avatar Jul 29 '22 14:07 joyeecheung

applied patch, built, run, segfault. Invalid v8heapconst could not cause this bug, right? Because I cannot generate it... :)

goloveychuk avatar Jul 29 '22 17:07 goloveychuk

@joyeecheung took 30 mins to generate snapshot :)

lldb for --v8-non-optimized-debug

➜  snapshot git:(master) ✗ lldb -- /home/badim/github/node/out/Debug/node --snapshot-blob snapshot.blob script.js
(lldb) target create "/home/badim/github/node/out/Debug/node"
Current executable set to '/home/badim/github/node/out/Debug/node' (x86_64).
(lldb) settings set -- target.run-args  "--snapshot-blob" "snapshot.blob" "script.js"
(lldb) run
Process 61239 launched: '/home/badim/github/node/out/Debug/node' (x86_64)
Process 61239 stopped
* thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x55c339d43a70)
    frame #0: 0x000055555791a3d5 node`std::atomic<bool>::store(bool, std::memory_order) [inlined] std::__atomic_base<bool>::store(__m=memory_order_relaxed, __i=true, this=0x000055c339d43a70) at atomic_base.h:404:18
   401 		__glibcxx_assert(__b != memory_order_acq_rel);
   402 		__glibcxx_assert(__b != memory_order_consume);
   403
-> 404 		__atomic_store_n(&_M_i, __i, int(__m));
   405 	      }
   406
   407 	      _GLIBCXX_ALWAYS_INLINE void
(lldb) bt
* thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x55c339d43a70)
  * frame #0: 0x000055555791a3d5 node`std::atomic<bool>::store(bool, std::memory_order) [inlined] std::__atomic_base<bool>::store(__m=memory_order_relaxed, __i=true, this=0x000055c339d43a70) at atomic_base.h:404:18
    frame #1: 0x000055555791a3bb node`std::atomic<bool>::store(this=0x000055c339d43a70, __i=true, __m=memory_order_relaxed) at atomic:104
    frame #2: 0x000055555827c678 node`v8::internal::ArrayBufferExtension::Mark(this=0x000055c339d43a70) at js-array-buffer.h:182:30
    frame #3: 0x000055555857140d node`v8::internal::JSArrayBuffer::MarkExtension(this=0x00007fffffffbb88) at js-array-buffer.cc:202:20
    frame #4: 0x0000555558246538 node`v8::internal::MarkingVisitorBase<v8::internal::MainMarkingVisitor<v8::internal::MajorMarkingState>, v8::internal::MajorMarkingState>::VisitJSArrayBuffer(this=0x0000555560e258d0, map=Map @ 0x00007fffffffbb90, object=JSArrayBuffer @ 0x00007fffffffbb88) at marking-visitor-inl.h:321:23
    frame #5: 0x000055555823a284 node`v8::internal::HeapVisitor<int, v8::internal::MainMarkingVisitor<v8::internal::MajorMarkingState> >::Visit(this=0x0000555560e258d0, map=Map @ 0x00007fffffffbbc0, object=HeapObject @ 0x00007fffffffbbb8) at objects-visiting-inl.h:64:5
    frame #6: 0x0000555558223cd1 node`std::pair<unsigned long, unsigned long> v8::internal::MarkCompactCollector::ProcessMarkingWorklist<(this=0x000055555dadbd40, bytes_to_process=65536)0>(unsigned long) at mark-compact.cc:2256:50
    frame #7: 0x0000555558208ab9 node`v8::internal::IncrementalMarking::Step(this=0x000055555d623ab0, max_step_size_in_ms=5, action=GC_VIA_STACK_GUARD, step_origin=kV8) at incremental-marking.cc:928:60
    frame #8: 0x0000555558208849 node`v8::internal::IncrementalMarking::AdvanceOnAllocation(this=0x000055555d623ab0) at incremental-marking.cc:878:7
    frame #9: 0x000055555820479b node`v8::internal::IncrementalMarking::Observer::Step(this=0x000055555d623b70, bytes_allocated=262128, addr=47321084409928, size=40) at incremental-marking.cc:49:44
    frame #10: 0x00005555581063e9 node`v8::internal::AllocationCounter::InvokeAllocationObservers(this=0x000055555d61c4a8, soon_object=47321084409928, object_size=40, aligned_object_size=40) at allocation-observer.cc:104:28
    frame #11: 0x00005555582e7014 node`v8::internal::SpaceWithLinearArea::InvokeAllocationObservers(this=0x000055555d61c480, soon_object=47321084409928, size_in_bytes=40, aligned_size_in_bytes=40, allocation_size=40) at spaces.cc:431:50
    frame #12: 0x000055555816c264 node`v8::internal::SpaceWithLinearArea::AllocateRawUnaligned(this=0x000055555d61c480, size_in_bytes=40, origin=kRuntime) at spaces-inl.h:272:28
    frame #13: 0x000055555816c29d node`v8::internal::SpaceWithLinearArea::AllocateRawSlow(this=0x000055555d61c480, size_in_bytes=40, alignment=kTaggedAligned, origin=kRuntime) at spaces-inl.h:307:55
    frame #14: 0x000055555816c0a4 node`v8::internal::SpaceWithLinearArea::AllocateRaw(this=0x000055555d61c480, size_in_bytes=40, alignment=kTaggedAligned, origin=kRuntime) at spaces-inl.h:253:79
    frame #15: 0x000055555818a83e node`v8::internal::AllocationResult v8::internal::HeapAllocator::AllocateRaw<(v8::internal::AllocationType)1>(this=0x000055555da70da0, size_in_bytes=40, origin=kRuntime, alignment=kTaggedAligned) at heap-allocator-inl.h:103:39
    frame #16: 0x000055555818ba25 node`v8::internal::HeapObject v8::internal::HeapAllocator::AllocateRawWith<(v8::internal::HeapAllocator::AllocationRetryMode)1>(this=0x000055555da70da0, size=40, allocation=kOld, origin=kRuntime, alignment=kTaggedAligned) at heap-allocator-inl.h:224:47
    frame #17: 0x000055555894d4b8 node`v8::internal::Heap::AllocateRawOrFail(this=0x000055555da70d48, size=40, allocation=kOld, origin=kRuntime, alignment=kTaggedAligned) at heap-inl.h:211:62
    frame #18: 0x0000555558950fde node`v8::internal::Deserializer<v8::internal::Isolate>::Allocate(this=0x00007fffffffd5b0, allocation=kOld, size=40, alignment=kTaggedAligned) at deserializer.cc:1317:43
    frame #19: 0x000055555894fb6b node`v8::internal::Deserializer<v8::internal::Isolate>::ReadObject(this=0x00007fffffffd5b0, space=kOld) at deserializer.cc:670:15
    frame #20: 0x00005555589570dc node`int v8::internal::Deserializer<v8::internal::Isolate>::ReadSingleBytecodeData<v8::internal::SlotAccessorForHeapObject>(this=0x00007fffffffd5b0, data='\x01', slot_accessor=SlotAccessorForHeapObject @ 0x00007fffffffc3f0) at deserializer.cc:938:50
    frame #21: 0x000055555894f827 node`v8::internal::Deserializer<v8::internal::Isolate>::ReadData(this=0x00007fffffffd5b0, object=Handle<v8::internal::HeapObject> @ 0x00007fffffffc700, start_slot_index=1, end_slot_index=7) at deserializer.cc:908:38
    frame #22: 0x000055555894ff4a node`v8::internal::Deserializer<v8::internal::Isolate>::ReadObject(this=0x00007fffffffd5b0, space=kOld) at deserializer.cc:709:11
    frame #23: 0x00005555589570dc node`int v8::internal::Deserializer<v8::internal::Isolate>::ReadSingleBytecodeData<v8::internal::SlotAccessorForHeapObject>(this=0x00007fffffffd5b0, data='\x01', slot_accessor=SlotAccessorForHeapObject @ 0x00007fffffffc820) at deserializer.cc:938:50
    frame #24: 0x000055555894f827 node`v8::internal::Deserializer<v8::internal::Isolate>::ReadData(this=0x00007fffffffd5b0, object=Handle<v8::internal::HeapObject> @ 0x00007fffffffcb30, start_slot_index=1, end_slot_index=3) at deserializer.cc:908:38
    frame #25: 0x000055555894ff4a node`v8::internal::Deserializer<v8::internal::Isolate>::ReadObject(this=0x00007fffffffd5b0, space=kOld) at deserializer.cc:709:11
    frame #26: 0x00005555589570dc node`int v8::internal::Deserializer<v8::internal::Isolate>::ReadSingleBytecodeData<v8::internal::SlotAccessorForHeapObject>(this=0x00007fffffffd5b0, data='\x01', slot_accessor=SlotAccessorForHeapObject @ 0x00007fffffffcc50) at deserializer.cc:938:50
    frame #27: 0x000055555894f827 node`v8::internal::Deserializer<v8::internal::Isolate>::ReadData(this=0x00007fffffffd5b0, object=Handle<v8::internal::HeapObject> @ 0x00007fffffffcf60, start_slot_index=1, end_slot_index=6151) at deserializer.cc:908:38
    frame #28: 0x000055555894ff4a node`v8::internal::Deserializer<v8::internal::Isolate>::ReadObject(this=0x00007fffffffd5b0, space=kOld) at deserializer.cc:709:11
    frame #29: 0x00005555589570dc node`int v8::internal::Deserializer<v8::internal::Isolate>::ReadSingleBytecodeData<v8::internal::SlotAccessorForHeapObject>(this=0x00007fffffffd5b0, data='\x01', slot_accessor=SlotAccessorForHeapObject @ 0x00007fffffffd080) at deserializer.cc:938:50
    frame #30: 0x000055555894f827 node`v8::internal::Deserializer<v8::internal::Isolate>::ReadData(this=0x00007fffffffd5b0, object=Handle<v8::internal::HeapObject> @ 0x00007fffffffd390, start_slot_index=1, end_slot_index=3) at deserializer.cc:908:38
    frame #31: 0x000055555894ff4a node`v8::internal::Deserializer<v8::internal::Isolate>::ReadObject(this=0x00007fffffffd5b0, space=kOld) at deserializer.cc:709:11
    frame #32: 0x000055555894eebc node`v8::internal::Deserializer<v8::internal::Isolate>::DeserializeDeferredObjects(this=0x00007fffffffd5b0) at deserializer.cc:306:15
    frame #33: 0x000055555894a0e5 node`v8::internal::ContextDeserializer::Deserialize(this=0x00007fffffffd5b0, isolate=0x000055555da64720, global_proxy=Handle<v8::internal::JSGlobalProxy> @ 0x00007fffffffd4f8, embedder_fields_deserializer=v8::DeserializeEmbedderFieldsCallback @ 0x00007fffffffd4e0) at context-deserializer.cc:47:31
    frame #34: 0x0000555558949f3c node`v8::internal::ContextDeserializer::DeserializeContext(isolate=0x000055555da64720, data=0x00007fffffffd7d0, can_rehash=true, global_proxy=Handle<v8::internal::JSGlobalProxy> @ 0x00007fffffffd580, embedder_fields_deserializer=v8::DeserializeEmbedderFieldsCallback @ 0x00007fffffffd570) at context-deserializer.cc:24:72
    frame #35: 0x0000555558987c89 node`v8::internal::Snapshot::NewContextFromSnapshot(isolate=0x000055555da64720, global_proxy=Handle<v8::internal::JSGlobalProxy> @ 0x00007fffffffd7b0, context_index=2, embedder_fields_deserializer=v8::DeserializeEmbedderFieldsCallback @ 0x00007fffffffd790) at snapshot.cc:212:35
    frame #36: 0x000055555834d5af node`v8::internal::Genesis::Genesis(this=0x00007fffffffdad0, isolate=0x000055555da64720, maybe_global_proxy=MaybeHandle<v8::internal::JSGlobalProxy> @ 0x00007fffffffd8c8, global_proxy_template=(val_ = 0x0000000000000000), context_snapshot_index=2, embedder_fields_deserializer=v8::DeserializeEmbedderFieldsCallback @ 0x00007fffffffda90, microtask_queue=0x0000000000000000) at bootstrapper.cc:6331:70
    frame #37: 0x00005555583239ea node`v8::internal::Bootstrapper::CreateEnvironment(this=0x000055555da62540, maybe_global_proxy=MaybeHandle<v8::internal::JSGlobalProxy> @ 0x00007fffffffdac0, global_proxy_template=(val_ = 0x0000000000000000), extensions=0x00007fffffffdd90, context_snapshot_index=2, embedder_fields_deserializer=v8::DeserializeEmbedderFieldsCallback @ 0x00007fffffffdb50, microtask_queue=0x0000000000000000) at bootstrapper.cc:332:36
    frame #38: 0x0000555557d7a925 node`v8::InvokeBootstrapper<v8::internal::Context>::Invoke(this=0x00007fffffffdbff, isolate=0x000055555da64720, maybe_global_proxy=MaybeHandle<v8::internal::JSGlobalProxy> @ 0x00007fffffffdb78, global_proxy_template=(val_ = 0x0000000000000000), extensions=0x00007fffffffdd90, context_snapshot_index=2, embedder_fields_deserializer=DeserializeInternalFieldsCallback @ 0x00007fffffffdba0, microtask_queue=0x0000000000000000) at api.cc:6240:78
    frame #39: 0x0000555557da3cff node`v8::CreateEnvironment<v8::internal::Context>(isolate=0x000055555da64720, extensions=0x00007fffffffdd90, maybe_global_template=(val_ = 0x0000000000000000), maybe_global_proxy=(val_ = 0x0000000000000000), context_snapshot_index=2, embedder_fields_deserializer=DeserializeInternalFieldsCallback @ 0x00007fffffffdd40, microtask_queue=0x0000000000000000) at api.cc:6340:27
    frame #40: 0x0000555557d7aa85 node`v8::NewContext(external_isolate=0x000055555da64720, extensions=0x00007fffffffdd90, global_template=(val_ = 0x0000000000000000), global_object=(val_ = 0x0000000000000000), context_snapshot_index=2, embedder_fields_deserializer=DeserializeInternalFieldsCallback @ 0x00007fffffffddf0, microtask_queue=0x0000000000000000) at api.cc:6383:76
    frame #41: 0x0000555557d7abd9 node`v8::Context::FromSnapshot(external_isolate=0x000055555da64720, context_snapshot_index=1, embedder_fields_deserializer=DeserializeInternalFieldsCallback @ 0x00007fffffffde10, extensions=0x0000000000000000, global_object=(val_ = 0x0000000000000000), microtask_queue=0x0000000000000000) at api.cc:6415:66
    frame #42: 0x0000555557adbbf9 node`node::NodeMainInstance::CreateMainEnvironment(this=0x00007fffffffe070, exit_code=0x00007fffffffdfdc) at node_main_instance.cc:188:79
    frame #43: 0x0000555557adb824 node`node::NodeMainInstance::Run(this=0x00007fffffffe070) at node_main_instance.cc:128:39
    frame #44: 0x0000555557a0f36c node`node::LoadSnapshotDataAndRun(snapshot_data_ptr=0x00007fffffffe138, result=0x00007fffffffe140) at node.cc:1255:40
    frame #45: 0x0000555557a0f517 node`node::Start(argc=4, argv=0x00007fffffffe2b8) at node.cc:1291:32
    frame #46: 0x0000555559411d7e node`main(argc=4, argv=0x00007fffffffe2b8) at node_main.cc:127:21
    frame #47: 0x00007ffff7acfd0a libc.so.6`__libc_start_main + 234
    frame #48: 0x00005555578c602a node`_start + 42
(lldb)

goloveychuk avatar Jul 30 '22 13:07 goloveychuk

@goloveychuk Thanks for the follow-up. Is the code causing this error related to https://github.com/nodejs/node/issues/40832#issuecomment-1155320869 by any chance?

joyeecheung avatar Aug 02 '22 17:08 joyeecheung

No, that error is "bus error", which was 30% error rate. I'm not able to reproduce that in current node master. We can ignore that, I think. This error I'm able to reproduce 100% of time

goloveychuk avatar Aug 02 '22 19:08 goloveychuk

@goloveychuk This looks like a duplicate of https://bugs.chromium.org/p/v8/issues/detail?id=13149 - can you try building with https://chromium-review.googlesource.com/c/v8/v8/+/3813068 and see if the crash goes away?

joyeecheung avatar Aug 11 '22 19:08 joyeecheung

// Remove global event listeners on the process object during snapshot serialization const listeners = process.listeners('uncaughtException'); process.removeAllListeners('uncaughtException');

// Take a snapshot of the application const snapshot = process.binding('v8').createSnapshot();

// Reinstall global event listeners on the process object during deserialization listeners.forEach(listener => { process.on('uncaughtException', listener); });

// Restore the application from the snapshot process.binding('v8').setFlagsFromString('--no-wasm-tier-up'); process.binding('v8').setSnapshot(snapshot);

ManojKumar-1204 avatar Mar 08 '23 11:03 ManojKumar-1204

I just want to voice my enthusiasm for supporting user-land modules in snapshots, after today I found out postcss-preset-env takes 210 ms to require() 😅

JoshuaWise avatar Apr 28 '23 07:04 JoshuaWise