RFC: ArrayBuffer support in TurboModules
Proposal: Adding first-class ArrayBuffer support to Codegen and TurboModules to enable zero-copy binary data exchange between JavaScript and Native modules.
View the rendered RFC
Firstly - congrats on a very thorough and well written RFC!
do we need to implement any additional synchronization mechanism to make this change thread-safe?
It's my understanding that the JS engine and JSI itself isn't thread-safe either.
In which case, I'd consider the limitation of thread-safely for ArrayBuffer an extension of that.
Should we introduce two kinds of buffers, mutable and read-only
I think that could be a nice addition indeed. I'd actually imagine a read-only variant would be used in most cases 🤔
Should this RFC focus on introducing only the basic and most valuable synchronous support for ArrayBuffer
Personally - I'd apply the 80-20 rule here and go for the least amount of work bringing most of the value and stick with the basic sync support.
Did you consider "views"? (DataView and typed arrays) and how those would interact with this feature? Could these be passed between JS and Native, if not - what's the failure case like? And should typed arrays be supported in codegen?
i really like this. for real-time media or GPU pipelines, say camera ---> ML ---> WebRTC, efficient ArrayBuffer bridging would make a world of difference. it'd enable moving small binary payloads(LUTs, masks, uniform buffers) without having to serialize or clone data just to cross the bridge.
a few things worth clarifying though:
-
thread safety: how do concurrent module calls handle shared buffers? media pipelines often run multiple workers in parallel, so locking semantics matter.
-
read-only vs writable: some data(e.g frame masks or GPU uploads) should likely be immutable once passed to JS; being explicit here could save a lot of edge-case debugging imo.
Firstly - congrats on a very thorough and well written RFC!
Thanks ❤️
do we need to implement any additional synchronization mechanism to make this change thread-safe?
It's my understanding that the JS engine and JSI itself isn't thread-safe either. In which case, I'd consider the limitation of thread-safely for
ArrayBufferan extension of that.
Agree with that.
Should we introduce two kinds of buffers, mutable and read-only
I think that could be a nice addition indeed. I'd actually imagine a read-only variant would be used in most cases 🤔
I agree with that, but after deeper investigation I couldn't find an easy and clean way to achieve that. One way would be to create a read-only view over the buffer when processing it, but that's on the developers. Also there is an active TC39 proposal for an Immutable ArrayBuffer which would provide a standardized, runtime-enforced way to prevent modifications to the buffer contents.
Should this RFC focus on introducing only the basic and most valuable synchronous support for ArrayBuffer
Personally - I'd apply the 80-20 rule here and go for the least amount of work bringing most of the value and stick with the basic sync support.
👍
Did you consider "views"? (
DataViewand typed arrays) and how those would interact with this feature? Could these be passed between JS and Native, if not - what's the failure case like? And should typed arrays be supported in codegen?
My idea is to have this solution type-agnostic as the underlying native classes, such as NSMutableData, java.nio.ByteBuffer and jsi::ArrayBuffer are an opaque containers for raw, uninterpreted bytes. If DataView or TypedArray is passed instead of ArrayBuffer, the developer should receive an appropriate warning.
The semantics around memory ownership seem to deviate from the spec of ArrayBuffer in regards to transferring/detaching. I'm not sure of what the material consequences are, especially with existing code that handles ArrayBuffers, but it seems like this could break developer expectations in many ways.
I'd expect the buffer to be moved, not borrowed, when passing between JS and native (in both directions).
ArrayBufferimplementations are not thread-safe; if multiple threads simultaneously read from or write to anArrayBuffer, race conditions can occur. To prevent this, developers must ensure that anArrayBufferis not accessed concurrently from different threads
This problem goes away if ownership is moved to the receiving thread. You shouldn't be able to even read an ArrayBuffer from multiple threads.
- Thread-safety of the
ArrayBuffer- do we need to implement any additional synchronization mechanism to make this change thread-safe?
So in summary and to answer this unresolved question, I would say absolutely yes. And to do so by using moves and not borrows.
Thanks @tom-sherman for your input!
Regarding this:
ArrayBufferimplementations are not thread-safe; if multiple threads simultaneously read from or write to anArrayBuffer, race conditions can occur. To prevent this, developers must ensure that anArrayBufferis not accessed concurrently from different threadsThis problem goes away if ownership is moved to the receiving thread. You shouldn't be able to even read an ArrayBuffer from multiple threads.
- Thread-safety of the
ArrayBuffer- do we need to implement any additional synchronization mechanism to make this change thread-safe?So in summary and to answer this unresolved question, I would say absolutely yes. And to do so by using moves and not borrows.
I agree that moving (transferring ownership) an ArrayBuffer coul be fundamentally safer and cleaner than borrowing it. However, the primary technical challenge remains: the current JSI and Hermes Runtime implementations do not expose a dedicated API for "detaching" an ArrayBuffer from the JavaScript side. Without true detachment, the only immediate way to transfer ownership is by "moving" the underlying buffer to the native thread and extending its lifetime accordingly. This addresses the memory management aspect but has a critical flaw:
- JS Validity: The ArrayBuffer remains valid on the JS side. Its properties, such as
byteLength, are not cleared. - Thread Safety Risk: The buffer can still be read or written to simultaneously from the JS thread while it is being used natively. This creates an easy opportunity for developers to violate thread-safety rules, even if documentation warns against post-transfer access.
I currently do not see a clean, safe path to fully implement buffer transfers that invalidate the JS reference. Achieving this requires dedicated changes to both the JSI specification and the underlying Hermes engine to introduce a proper detachment mechanism. Since I don't have a deep expertise in this topic, output from more experienced developers is really welcome and highly appreciated.
I don't have any expertise as to how to solve the invalidation of JS references in Hermes and JSI, but I wanted to add another voice highlighting the importance of ownership transfer. As far as I am aware, JS does not have other instances where a developer needs to think about thread-safety - it is always thread safe by default. When working with threads (e.g. a Worker in node or a WebWorker in the browser), references are either copied or transferred (e.g. zero-copy, but the reference is removed in the source thread). A JS developer needing to be aware that they cannot modify an ArrayBuffer while it is being written in the native thread is a big ask, especially as we are likely talking about developers who are consuming native modules which use this code, and may not be familiar with thread-safety as a concept at all. This might be a really hard problem to solve though as @paradowstack says!
The semantics around memory ownership seem to deviate from the spec of ArrayBuffer in regards to transferring/detaching. I'm not sure of what the material consequences are, especially with existing code that handles ArrayBuffers, but it seems like this could break developer expectations in many ways.
That said, it's worth noting that (unless I'm mistaken) this is already how JSI, Expo Modules, and Nitro Modules handle ArrayBuffers today.
On one hand, aligning with existing community behavior might make sense for practical and compatibility reasons. On the other hand, once this behavior becomes part of the core, its reach and visibility will likely expand far beyond those ecosystems, making the current de facto behavior less relevant over time.
Leaving this as an open question and summoning a few folks from the community for feedback!
Thanks for putting this together.
Generally, I'm very aligned supporting a type-safe abstraction over the existing ArrayBuffer support in JSI, and there seems to be plenty of use-cases where this would become a good way forward to unlock cheaper data sharing between JS and native.
I agree with the concerns around thread-safety expressed in this thread here. Is there any prior art we can reference? Is the operation model similar to SharedArrayBuffer? Should we consider Atomics as a complementary but necessary capability here?
Making this fully support async JS to native invocation calls will likely increase complexity, as it would require us to keep the JS object alive for the duration of the native memory reference, but not impossible. Alternatively, we'd need to make it really obvious that ArrayBuffer args can only be used in sync calls through codegen.
I agree with the concerns around thread-safety expressed in this thread here. Is there any prior art we can reference?
AFAIK other implementations (such as Nitro Modules or Expo Modules) has similar approach - ownership is not transferred from JS to Native, but borrowed. From what I see thread safety is either manual or ensured by copying the content (e.g. Expo Blob) - I am not aware of the other solution to this problem - happy to be corrected.
Is the operation model similar to Single-ownership model where buffers are either borrowed (JS→Native, sync only) or transferred (Native→JS)? Should we consider Atomics as a complementary but necessary capability here?
The proposed operational model isn't similar to SharedArrayBuffer I would say - there is no synchronisation. It's more single-ownership model where buffers are either borrowed (JS→Native, sync only) or transferred (Native→JS). Can we somehow provide using JSI similar mechanism for safe concurrent access as these primitives offers?
Making this fully support async JS to native invocation calls will likely increase complexity, as it would require us to keep the JS object alive for the duration of the native memory reference, but not impossible. Alternatively, we'd need to make it really obvious that ArrayBuffer args can only be used in sync calls through codegen.
Yes, I can see we can do it either way. Would it also be a applicable solution to the "borrowed" vs "moved" discussion? Instead it can be "shared", by keeping JS object alive. It would not eliminate the thread-safety problems, but perhaps could be another solution to the problem.
Thank you all for the feedback and discussion on this RFC! I've updated the proposal.
There are two open questions and fundamental design decisions that need input:
- Transfer vs. Borrowing Semantics The current proposal uses borrowing semantics for JS→Native due to JSI/Hermes API limitations. However, as pointed out, this deviates from web standards and places unusual thread-safety burden on developers. Transfer semantics would solve this but from my understanding requires new JSI/Hermes APIs. Should we proceed with borrowing as a pragmatic interim solution? Do you see the other feasible solutions to the problem?
- Thread-Safety Model Related to the above, we need to decide whether to require manual developer coordination or enforce thread-safety through ownership transfer or some synchronisation mechanism.
I'd appreciate guidance from the core team and community on how to proceed with these questions!
@javache did you maybe have time to take a look again at this RFC and open questions once again? 😊
Here are my general thoughts on the matter, and unfortunately I see a couple of fundamental problems (sorry, I should have joined here earlier):
- ArrayBuffers are simply not very suitable for passing to asynchronous native code because they can be detached or resized. AFAICT, NodeJS doesn't have any asynchronous APIs that operate on ArrayBuffer precisely for that reason.
- ArrayBuffers are also not suitable because technically there is nothing preventing the engine from allocating the buffer contents in the GC heap, thus making it movable.
- Using "transfer" semantics when calling native code would be unusual - after all we are not sending the array to another process or runtime. Again, NodeJS doesn't do that. Designing APIs around unusual techniques is questionable.
- The threading issue is not a concern. Usually it is not the APIs job to protect a buffer from simultaneous access.
Problem 1 has partial workarounds - you can check whether the buffer is resizable and reject it, but you can't prevent it from becoming asynchronously detached. Problem 2 is basically unsolvable.
This is why NodeJS uses Buffer - it can fully control it.
My recommendation: give up on ArrayBuffer. It simply can never work well because of problem 2. It will always be somewhat of a hack. Use Buffer instead, which is well understood by everybody, compatible with NodeJS, etc, etc.
If you folks are dead-set on ArrayBuffer, I think @mrousavy's proposal for getMutableBuffer() is a solution. If the ArrayBuffer was created by native code as a MutableBuffer, then obtaining a shared_ptr to it gives sufficient guarantees, I think. If it wasn't, it should be rejected at runtime by the API. Personally I don't think that would be a great API, but we are going to add getMutableBuffer() anyway.