node icon indicating copy to clipboard operation
node copied to clipboard

Pointer Compression and Isolate Groups

Open jasnell opened this issue 1 year ago • 35 comments

For a while now we've been trying to work towards the ability to enable v8 pointer compression (https://v8.dev/blog/pointer-compression) by default in Node.js (https://github.com/search?q=repo%3Anodejs%2Fnode+%22Pointer+compression%22&type=issues). The key limitation that we have had up to this point is that enabling pointer compression has historically forced a process-wide maximum v8 heap limit of 4 GB, which obviously would be a significant breaking change in the runtime, despite the fact that enabling pointer compression has been benchmarked to provide a 40% performance improvement overall.

The key reason for the 4 GB limitation is that enabling pointer compression introduces a concept of a "pointer cage" within which the compressed pointers are held. This pointer cage is limited in size at compile time (I believe but could be misremembering but it might be tunable at compile time but it still ends up placing a fairly strict limit on heap size that would be a breaking change). But this size limit is inherent in the concept of pointer compression and cannot be avoided in order for pointer compression to even function.

For a while now, Cloudflare Workers (which also uses v8) has been running in a generally unsupported configuration of v8 that enables pointer compression but disables the shared pointer cage. As v8 development has progressed, the need to solve limitations of the shared pointer cage have become critical to prevent v8 from having to continue maintaining this unsupported configuration mode that workers has relied on.

So earlier this year Cloudflare partnered with Igalia to implement a new feature in v8 that will be fully landing soon called an "Isolate Group". Most of the work has already landed in v8.

An Isolate Group represents a whole pointer cage with the 4 GB limit. The key difference is that we can now create multiple Isolate Groups within a single process, essentially allowing us to create multiple groupings of isolates with each group having a maximum of 4 GB but removing the process-wide maximum heap restriction.

Any number of Isolates can be created within an Isolate Group, all of which would sit within the same pointer cage.

So whereas today in Node.js, with pointer compression enabled, the process would look something like...

  +----------------------------------------------------------------------------+
  |                           Pointer Cage   (4GB)                             |
  |   +-----------------+     +-----------------+    +-----------------+       |
  |   |     Isolate     |     |    Isolate      |    |     Isolate     |       |
  |   |  (main thread)  |     |   (worker 1)    |    |    (worker 2)   |       |
  |   +-----------------+     +-----------------+    +-----------------+       |
  +----------------------------------------------------------------------------+

With Isolate Groups it could look more along the lines of ...

  +------------------------+------------------------+--------------------------+
  |   Pointer Cage   (4GB) |   Pointer Cage   (4GB) |  Pointer Cage   (4GB)    |
  |   +-----------------+  |   +-----------------+  |  +-----------------+     |
  |   |     Isolate     |  |   |    Isolate      |  |  |     Isolate     |     |
  |   |  (main thread)  |  |   |   (worker 1)    |  |  |    (worker 2)   |     |
  |   +-----------------+  |   +-----------------+  |  +-----------------+     |
  +------------------------+------------------------+--------------------------+

In other words, rather than creating all isolates (main thread + worker thread isolates) in a single shared process-wide pointer cage, each isolate can be created in a separate pointer cage ("isolate group"), of which we can now have any number, meaning the entire process is no longer limited to just a single 4 GB heap.

Obviously ,this still becomes a breaking change because individual isolates will have the imposed 4 GB limit but with pointer compression this is far less of a limitation than it actually may first appear. The vast majority of Node.js applications that exceed 4 GB heaps without pointer compression will run just fine in a 4 GB heap of compressed pointers.

When Node.js finally updates to a version of v8 that has the Isolate Groups work included, I plan to open a PR that adds an experimental compile-time flag to enable automatic use of Isolate Groups. The code change for this is simple:

With this new compile flag enabled, rather than:

auto isolate = v8::Isolate::New(...);

We would have...

auto group = v8::IsolateGroup::New();
auto isolate = v8::Isolate::New(group, ...);

And compile the Node.js process with pointer compression enabled.

My hope is that we would soon thereafter be able to make this the default build mode for Node.js in a later phase 2.

In that later phase, if no issues are encountered, we would flip the default so that the main release builds would compile with pointer compression and isolate groups enabled, with a compile flag option to disable these.

This ought to result in a significant memory and performance improvement across the board in Node.js and ALSO provides the benefit of allowing every Node.js worker thread to run in its own v8 sandbox (https://v8.dev/blog/sandbox) also boosting overall security.

Cloudflare and Igalia soon plan to publish a blog post covering all of this in far more detail along with some additional work we are collaborating on. There are a number of additional technical details that should be covered in extensive detail in that blog post. That blog post would likely serve as the official public announcement of this work but for now, I wanted to open this issue as a way of giving folks here a heads up and to serve as a tracking issue for the work as it progresses. It was important to us that whatever work was done here did not just benefit Cloudflare but could also be used by any runtime built on v8 including Node.js, Deno, and even browsers.

The actual changes to Node.js are expected to be minimal and the vast majority of applications will see no difference other than improved performance and memory usage. Applications that do require massive heap sizes would likely see an impact so we will need to work out some solutions for those.

I will be sharing more details as things progress so watch this thread. And if you have any questions, ask away and I'll answer whatever I can!

/cc @mcollina @joyeecheung @targos @anonrig @nodejs/v8 @nodejs/workers @nodejs/tsc

jasnell avatar Nov 05 '24 22:11 jasnell

Super cool. Would this isolate group possibly enable more efficient ways of transferring/sharing resources between isolates in the future?

On a separate note. It would be nice if we had a somewhat official pointer compression build to let people try it out, with the very restrictive memory limit. We used to have a build hidden somewhere but it is no longer built with more recent versions of Node.

ronag avatar Nov 06 '24 06:11 ronag

I've been following the isolate group work and I'm very excited about it, but I've been unable to get clear answers about whether adopting this model will lock a runtime out of using the upcoming shared structs proposal. I don't know how much cloudflare workers cares about this, but I think it will need to be supported in node and deno.

devsnek avatar Nov 06 '24 07:11 devsnek

We used to have a build hidden somewhere but it is no longer built with more recent versions of Node.

I think that is because the CI broke e.g. https://ci.nodejs.org/job/node-test-commit-linux-pointer-compression/602/nodes=rhel8-x64/console

joyeecheung avatar Nov 06 '24 11:11 joyeecheung

On a separate note. It would be nice if we had a somewhat official pointer compression build to let people try it out, with the very restrictive memory limit. We used to have a build hidden somewhere but it is no longer built with more recent versions of Node.

It's part of https://unofficial-builds.nodejs.org/. The builds broke from Node.js 22 and were disabled. Feel free to submit pull requests to the recipe to get them back.

https://github.com/nodejs/unofficial-builds/pull/158 tried to reenable for Node.js 22 but the logs for the 22.11.0 pointer compressed build show that it's still broken.

richardlau avatar Nov 06 '24 12:11 richardlau

@ronag:

Would this isolate group possibly enable more efficient ways of transferring/sharing resources between isolates in the future?

Not really, and in fact if we end up enabling v8 sandbox where each isolate group is a separate sandbox this will get even trickier, but there are options. But yeah, isolate groups won't impact this.

@devsnek:

... but I've been unable to get clear answers about whether adopting this model will lock a runtime out of using the upcoming shared structs proposal.

Isolates running within the same isolate group ought to be able to use shared structs. If we have a model where every worker thread is running a separate isolate group, then there will likely be restrictions in that case using shared structs between them. However, if we allow multiple worker threads to be created within a single isolate group, those ought to be able to take advantage of that with the tradeoff being that those collectively will be limited to the heap 4 GB limit.

jasnell avatar Nov 06 '24 22:11 jasnell

If I read this correctly, we need an API to create a new isolate group, and that pass that as an option to the Worker constructor. By default, it would use whatever main uses.

mcollina avatar Nov 10 '24 19:11 mcollina

Related: #55325 (fairly sure that one is isolate groups not being fully baked yet)

bnoordhuis avatar Nov 10 '24 20:11 bnoordhuis

If I read this correctly, we need an API to create a new isolate group, and that pass that as an option to the Worker constructor. By default, it would use whatever main uses.

For worker threads, what I imagine is, by default all worker threads would run in their own isolate groups, with an option to allow spawning a new worker thread in the same isolate group as the current.

So... for example,

// main thread is in its own isolate group...

// worker is created in a separate isolate group...
const worker = new Worker('....');
// main thread is in its own isolate group...

// worker is created within this threads isolate group...
const worker = new Worker('...', { group: 'parent' });

But that's just what I'm thinking right now. We'll need to evaluate the options.

jasnell avatar Nov 10 '24 21:11 jasnell

@bnoordhuis :

fairly sure that one is isolate groups not being fully baked yet ...

Yep. There are still a couple of todos remaining. The Igalia folks are on it but please let me know if any more pop up. I don't intend to rush this in. Want to make sure things are solid.

jasnell avatar Nov 10 '24 21:11 jasnell

Another possible approach for the JS API here would be to expose Isolate Groups as an object under the worker_threads module such as...

import { WorkerGroup } from 'node:worker_threads';

const group = new WorkerGroup();  // Wraps an IsolateGroup
const worker = group.newWorker('...', { ... });  // Creates a new worker thread within the group

The advantage of this API is that it leaves the existing Worker constructor options alone and it becomes explicitly discoverable by checking for the existence of WorkerGroup as opposed to knowing whether or not a group option is supported (as it would be silently ignored in older Node.js versions).

We've got a number of weeks before everything lands in v8 so there's lots of time to bikeshed alternatives here.

jasnell avatar Nov 11 '24 18:11 jasnell

@jasnell that's more in line to what I was thinking.

mcollina avatar Nov 11 '24 20:11 mcollina

Obviously ,this still becomes a breaking change because individual isolates will have the imposed 4 GB limit but with pointer compression this is far less of a limitation than it actually may first appear. The vast majority of Node.js applications that exceed 4 GB heaps without pointer compression will run just fine in a 4 GB heap of compressed pointers.

I wonder if this is something we should put into the next survey or just poll it a bit, because it seems a surprising amount of applications do need more than 4GB. Previously I thought it was mostly needed for giant servers (I have heard from some enterprise users about this), but then I learned running tsc type checking on large code bases like Babel can already take about 12GB, and there are probably a lot of large TypeScript monorepos out there? I have also extended it to ~10GB when running c8 on big enough coverage reports. Arguably the tools can make use of workers more to spread out the loads to multiple cages, but that will take time and in the mean time for end users, updating CLI dev dependencies to do it can be a lot harder than just updating your own application code to use workers.

Not to say that this is a blocker for releasing pointer compression as the benefit is too tempting to ignore for apps that don't need 4GB, but if the survey/poll shows that it can regress too many people that do need >4GB then I think we need to prioritize it a bit to have a good story about how to make the transition less painful. That might need better support from the version managers (e.g. imagine instead of using 26 in the setup-node github action, it's possible to use 26/unlimited, which I believe just needs an extra release channel + support in nvm to work)

joyeecheung avatar Dec 04 '24 05:12 joyeecheung

Since memory became cheaper as the AI hype. I still think Node.js without memory limit would be a perfect default. Provide an official slim version (point compressions on) is a good idea from my point of view.

gengjiawen avatar Dec 04 '24 07:12 gengjiawen

@jasnell Do you have a schedule for the change? I wonder that if Node TSC ends up disabling this feature by default, perhaps I might be able to help getting the unofficial builds back up & running once the internals work again.

For our company, pointer compression brought so many benefits that we rather keep on running legacy versions of Node until there is a working alternative. Memory is cheap only as long as you don't buy it in form of virtual machines :)

laurisvan avatar Jan 10 '25 08:01 laurisvan

@laurisvan Pointer compression has already been a configure-time option that is disabled by default for years, though that the configure-time option has recently been broken by (likely) recent V8 changes, which was what we were talking about in https://github.com/nodejs/node/issues/55735#issuecomment-2459607626

joyeecheung avatar Jan 10 '25 11:01 joyeecheung

@joyeecheung Yes, that is exactly what we have been using for a while, and the broken build has blocked us from upgrading. I tried to investigate the break and tune some of the node.js internals - but when I spotted this issue, I understood that the underlying problems are more likely more profound (e.g. even if I managed to get the builds again up & running, it still likely wouldn't do the things right). I actually do have a hint of this with our current node.js pointer compressed builds - I am almost sure that either V8 or its node.js integration does not correctly keep track of available memory, as we constantly run out of memory when doing a full sweep GC.

laurisvan avatar Jan 13 '25 11:01 laurisvan

@laurisvan .... no definitive schedule. We have a status update call later this week that should help narrow down expected timeframe.

jasnell avatar Jan 13 '25 15:01 jasnell

Blog post briefly discussing isolate groups .... https://dbezhetskov.dev/multi-sandboxes/

jasnell avatar Jan 17 '25 15:01 jasnell

Quick status update... the isolate groups API is nearly stable in v8. Most of the key pieces are there with just a handful of small fixes/patches necessary to land over hopefully the next week. Current estimation is that isolate groups and multiple v8 sandboxes should be available generally in v8 13.4 but there's always a chance it could slip to 13.5.

The blog post from Igalia that I mention in the previous comment covers the basic details. The implementation is far enough along in v8 that it ought to be possible to start experimental/exploratory use in Node.js

jasnell avatar Jan 17 '25 15:01 jasnell

@jasnell can you post an update on this?

mcollina avatar Mar 08 '25 12:03 mcollina

I'd like to raise a voice in opposition to enabling pointer compression by default. When Electron did it, they broke everyone and everything, and the mess is still unsettled, a lot of people were forced to start compiling Electron by themselves, resulting in all sorts of unsupported scenarios - https://github.com/electron/electron/issues/35801

Besides breaking the ABI guarantees, this change will break all applications requiring more than 4GB of RAM, using mmap(), or using any C++ libraries that rely on malloc/new().

I understand the allure for Cloudflare workers specifically, however it is not a good reason to break everybody else.

WillAvudim avatar May 05 '25 17:05 WillAvudim

As far as I understand it, the intent was to just enable this for Workers behind a configuration flag or something like that. So worker_threads using apps can opt-in where desired and no other code would be impacted.

Qard avatar May 05 '25 17:05 Qard

As far as I understand it, the intent was to just enable this for Workers behind a configuration flag or something like that. So worker_threads using apps can opt-in where desired and no other code would be impacted.

You would like to ship two V8 builds with node, one with pointer compression/cage specifically for workers, and another one for the main thread? How exactly are you going to organize the interactions (e.g. SharedArrayBuffer requires global support from the GC and to support Atomics.notify/waitAsync) between the compiled V8 engines? I don't think that your suggestion is technically feasible, it would require massive rewrites in V8 to support both V8 cores in multithreaded scenarios.

WillAvudim avatar May 05 '25 18:05 WillAvudim

this change will break all applications requiring more than 4GB of RAM

Not really correct, applications currently using more than 6-8GB of heap today (uncompressed) would not be able to use pointer compression. Which I would argue is quite a small number. Devs who need more than 4GB (uncompressed) today need to manually increase old space size, so using the no compression node binary (or compiling their own) is something that can be documented when old-space-size > ~8GB.

however it is not a good reason to break everybody else.

This is not going to "break everybody else". The reason this has been discussed since 2019 and still isn't enabled by default is specifically because the TSC wants to ensure this is done properly and tested thoroughly. I'm not going to pretend like I know the impact of all native modules here, but if you read through the original post from 2019 HERE

Currently, enabling pointer compression breaks native addon compatibility for Node.js applications (unless N-API is being used), but this is likely going to change in the near future, making builds of Node.js with and without pointer compression almost fully interchangeable.

I would imagine the plan would be to release pointer compression as optional in an odd numbered release in order to get feedback and allow testing - with the goal of enabling by default once we're confident that it is non breaking.

he intent was to just enable this for Workers behind a configuration flag or something like that

I think you may be confusing the option to spawn worker threads with a new/existing isolate group? Pointer compression is a compile time flag, so all worker threads would use the same, but spawning a new worker thread would allow a user to specify a new isolate group or not so that the thread can use the full 4GB (compressed).

Just my 2 cents: I can't wait for pointer compression. We are currently running hundreds of NodeJS processes in Kubernetes, with total RAM requests ~1TB (this fluctuates depending on load). Being able to reduce reserved RAM by 40% would save us $75k/yr in EC2 costs while improving performance. Just think of the millions (billions?) of NodeJS processes running in the wild and how this change will impact global energy usage (RAM is quite power hungry) and just the general perception of NodeJS as a technology.

SeanReece avatar May 08 '25 15:05 SeanReece

this change will break all applications requiring more than 4GB of RAM, using mmap(), or using any C++ libraries that rely on malloc/new().

My, simplistic, understanding of pointer compression from https://youtu.be/TPm-UhWkiq8?si=kExaHSHxVztaaTne&t=1054 was that only the JS heap is 'compressed'. So the process would still be 64bit and native add-ons could allocate more than 4GB, and things like ArrayBuffers wouldn't count?

acutmore avatar May 08 '25 15:05 acutmore

this change will break all applications requiring more than 4GB of RAM, using mmap(), or using any C++ libraries that rely on malloc/new().

My, simplistic, understanding of pointer compression from https://youtu.be/TPm-UhWkiq8?si=kExaHSHxVztaaTne&t=1054 was that only the JS heap is 'compressed'. So the process would still be 64bit and native add-ons could allocate more than 4GB, and things like ArrayBuffers wouldn't count?

For as long as there is no interaction between the addons and Javascript. However, once you decide to actually use the C++ addon from Javascript, you'll face "external ArrayBuffer is not allowed" and other surprises. Any interactions that require accessing memory becomes impossible, e.g. you will kill any possibility for https://nodejs.org/docs/latest-v22.x/api/n-api.html#napi_create_external_arraybuffer among other things.

A bit unrelated, however I'm going to bet that you guys will go through the same cycle as we did just recently. We spent several months investigating pointer compression for our massive app, we are dealing with neural networks, large images, and performance is absolutely crucial for us. The touted performance and memory improvements for pointer compression, while looking attractive in benchmarks, never materialized in practice. And the biggest trick for us was to switch to memory-mapped IO (shm_open(), mmap()), which skips linux kernel (no pwrite/pread), Node.js, libuv entirely, and that provided massive boost in I/O. We also switched to atomics and futex'es across processes, and started using pthreads in addition to node workers. And for cross-machines, there is nothing faster than RDMA that lets you write directly into memory on a remote host, also skipping the kernel/Node.js/libuv/sockets/async machinery. That alone gave us almost two orders of magnitude speed up in real load. Plus rewriting CPU-intensive code in C++ and CUDA also helped a lot. By enabling pointer compression/cage you cut yourself from the biggest opportunities for massive speedups. There is a reason why other browsers and JS engines are no pursuing pointer compression - they do not speed things up past certain benchmarks that are not necessarily representative of real code.

For Cloudflare workers specifically I'm going to bet that the biggest perf gain you can get would come from the ability to reset V8 VM to its original state. That will let you avoid startup and GC costs, GC will effectively act as an arena, and you won't need to initialize lots of environments to spin off workers. V8 environments are too heavy, whether you use pointer compression or not.

You can read up on the problems you're about to get in https://github.com/electron/electron/issues/35241, https://github.com/WiseLibs/better-sqlite3/issues/981, people were jumping the ship, dealing with unexplainable bugs, it was (and still is) a real mess.

WillAvudim avatar May 08 '25 19:05 WillAvudim

Just my 2 cents: I can't wait for pointer compression. We are currently running hundreds of NodeJS processes in Kubernetes, with total RAM requests ~1TB (this fluctuates depending on load). Being able to reduce reserved RAM by 40% would save us $75k/yr in EC2 costs while improving performance. Just think of the millions (billions?) of NodeJS processes running in the wild and how this change will impact global energy usage (RAM is quite power hungry) and just the general perception of NodeJS as a technology.

All our machines have at least 64 GB of RAM and our app (Node.js/Typescript) currently needs at least 48GB to function. We run neural network computations on every mouse move, and that requires the presence of large number of precomputed data structures and images to be immediately present for rendering.

We did evaluate pointer compressions for our workload as well, and the gains never materialized. We didn't get much in performance or memory savings, even when we tried to reduce the memory loads to fit into 8GB.

The way I see it, it will only prevent you from unleashing the true potential of your machine, because you won't be able to use mmap()'ed segments or any C++ addons that allocate memory externally and need to share it with code in Typescript. The end result will be slower and less efficient implementation.

lmdb is much faster than sqlite, same for libmdbx, both rely on mmap() and they let you access database records in-place, without deserialization or marshalling it into Node.js types. Why do you want to eliminate direct external memory access? You'll lose both in performance and memory consumption if you can't access data directly.

WillAvudim avatar May 08 '25 20:05 WillAvudim

However, once you decide to actually use the C++ addon from Javascript, you'll face "external ArrayBuffer is not allowed" and other surprises.

I think the ArrayBuffer issues comes from sandboxing, not pointer compression per-se. I was trying to fix the pointer compression builds in https://github.com/nodejs/node/pull/58171 to bring node-daily-master back to green, and just realized Node.js currently cannot enable sandbox since it has quite a lot of backing store of array buffers pointing directly at some off-heap memory (some are in static storage e.g. the fast toggles for trace events, and aren't even allocated by Node.js, so can't be put into the sandbox even if we want to, probably similar to the mmap-ed use cases you described). It works otherwise okay (i.e. passes the tests) if sandbox is disabled.

joyeecheung avatar May 08 '25 20:05 joyeecheung

There are some misconceptions to clear up. The reason enabling pointer compression previously limited the process as a whole to a 4 GB heap is because there was one cage per process, no matter how many isolates were created they were all created as part of the same cage. With the introduction of isolate groups we allow for multiple cages per process. Each still has the 4 GB limit by default but the entire process is no longer limited to a single 4 GB limit. We have not yet decided whether and how to best take advantage of this in Node.js and absolutely no decisions have been made yet one way or the other. I'm waiting for the upstream bits in v8 to stabilize and to make their way into Node.js via v8 updates before I start prototyping things here.

The basic idea I have to try initially is for worker threads to optionally be created in separate isolate groups (cages), with each worker getting its own group by default. This would mean that the main thread is limited to 4 GB and each worker has its own 4 GB limit. Then, optionally, allow multiple workers to be created within a single isolate group, etc.

IF things work out and if it's not too big of a breaking change then we will make a decision about whether this is the approach we want to take, and it's still a very big if. It might be that we continue to only have pointer compression be an opt-in compile time flag, or we might have a compile time flag to turn off pointer compression, etc. Too early to say.

jasnell avatar May 08 '25 20:05 jasnell

And yeah, v8 sandbox is another thing entirely. While it does build on the pointer compression cage we can enable pointer compression without turning on v8 sandbox. It's not clear just how much Node.js will be able to take advantage of the sandbox without a number of significant changes.. the most notable being the fact that all backing stores for ArrayBuffer and SharedArrayBuffer are required to be allocated within the sandbox -- some of the tricks we do with externally held data allocation aren't compatible and would need to be refactored.

jasnell avatar May 08 '25 20:05 jasnell