node icon indicating copy to clipboard operation
node copied to clipboard

bootstrap: make snapshot reproducible

Open joyeecheung opened this issue 1 year ago • 4 comments

deps: V8: cherry-pick b639938e99fa

Original commit message:

[snapshot] support context embedder data serialization

Previously all context embedder data are serialized verbatim,
so if a context embedder data slot contains a pointer, embedders
need to figure out how to store data for it on the side for the
snapshot. They also have to reset the pointers to null to make
the snapshot reproducible.

To address this issue this patch adds
v8::(Des|S)erializeContextDataCallback as part
of v8::(Des|S)erializeEmbedderFieldsCallback and use them during
(de)serialization of context embedder data. The bahavior is similar
to that of v8::(Des|S)erializeInternalFieldsCallback: If the
returned data is not empty, the returned data is going to be
written into the snapshot blob and passed to the embedder during
deserialization. Otherwise the data is serialized verbatim.

Change-Id: I4113c76fc6dc8e3503cde5533320402454e62097
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5074252
Reviewed-by: Igor Sheludko <[email protected]>
Reviewed-by: Leszek Swirski <[email protected]>
Commit-Queue: Joyee Cheung <[email protected]>
Cr-Commit-Position: refs/heads/main@{#92675}

Refs: https://github.com/v8/v8/commit/b639938e99fa6b5ffa9c859b18c72a251fd56942

deps: V8: cherry-pick 7fc698c9039d

Original commit message:

[snapshot] allow NativeContext serialization to be deferred

To serialize the context embedder data, we just need the back
reference to the embedder data array. The context can be deferred.

Bug: v8:14662
Change-Id: I6a3c9f35596cd17b7e6bc773cf598922b5b7488d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5367362
Reviewed-by: Igor Sheludko <[email protected]>
Commit-Queue: Joyee Cheung <[email protected]>
Reviewed-by: Leszek Swirski <[email protected]>
Cr-Commit-Position: refs/heads/main@{#92892}

Refs: https://github.com/v8/v8/commit/7fc698c9039dc9764512823c46d9805824ec8427

src: add utilities to help debugging reproducibility of snapshots

  • Print offsets in blob serializer
  • Add a special node:generate_default_snapshot ID to generate the built-in snapshot.
  • Improve logging

bootstrap: make snapshot reproducible

This patch uses the new V8 API to {de}serialize context slots for snapshot in order to make the snapshot reproducible. Also added a test for the reproducibility of snapshots.

Refs: https://github.com/nodejs/build/issues/3043

joyeecheung avatar Nov 30 '23 19:11 joyeecheung

Review requested:

  • [ ] @nodejs/gyp
  • [ ] @nodejs/startup
  • [ ] @nodejs/v8-update

nodejs-github-bot avatar Nov 30 '23 19:11 nodejs-github-bot

great work! keep it up!

haraldh avatar Jan 31 '24 15:01 haraldh

CI: https://ci.nodejs.org/job/node-test-pull-request/57908/

nodejs-github-bot avatar Mar 22 '24 19:03 nodejs-github-bot

Somehow this is reproducible locally (macOS + x64 & arm64), and in the CI, on FreeBSD and SmartOS, but not on others. There are still 3-4 words of differences in the snapshot on those platforms. I'll investigate.

joyeecheung avatar Mar 23 '24 15:03 joyeecheung

Split the context data callback part to https://github.com/nodejs/node/pull/53217 - at least that makes the context data slots serialized with some non-gibberish (empty values), otherwise V8 might run into issues serializing the gibberish pointer values.

joyeecheung avatar May 30 '24 13:05 joyeecheung

Is this PR still useful after https://github.com/nodejs/node/pull/53217 being splitted?

legendecas avatar May 31 '24 10:05 legendecas

Yes, in this PR are still some utilities for checking reproducibility and a test for that (but it's not currently passing because a few other bytes are still not reproducible, I will work on those bytes here).

joyeecheung avatar May 31 '24 20:05 joyeecheung

Some discoveries:

  1. We need to return non-empty data for pointer values in context data otherwise V8 would still serialize them verbatim (that was what this PR used to do, in https://github.com/nodejs/node/pull/53217 I changed it to return empty data and that added back some more variance)
  2. The MemoryMode in EmbedderTypeInfo and the EmbedderObjectType in InternalFieldInfoBase are padded. We need to get rid of the padding - the easiest way to do so is to just make the enums uint64_t (a harder way would be to implement custom serialization for all the wrapper structs but I feel that's an overkill for a couple of bytes of overhead for now).

After fixing the two on Linux I am getting reproducible snapshots. Will update the branch after I clean it up..

joyeecheung avatar Jun 12 '24 23:06 joyeecheung

CI: https://ci.nodejs.org/job/node-test-pull-request/59759/

nodejs-github-bot avatar Jun 13 '24 16:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/59762/

nodejs-github-bot avatar Jun 13 '24 17:06 nodejs-github-bot

Oof, CI is finally green. Can I get some reviews please? @nodejs/cpp-reviewers @nodejs/startup @nodejs/build @legendecas

joyeecheung avatar Jun 13 '24 20:06 joyeecheung

Landed in 922feb1ff5527f4442e85d7f61cb95b120df677c...1872167bb7b2cd69fb6e507bfcb6b9444d505904

nodejs-github-bot avatar Jun 14 '24 09:06 nodejs-github-bot