node
node copied to clipboard
Start moving to Uint8Array in new APIs?
There was a suggestion by @jasnell to use Uint8Array
s in new APIs over Buffer
s as well as a weigh-in by @sindresorhus saying it is easier to author cross-platform APIs when using Uint8Array
s.
Here is a context https://github.com/nodejs/node/pull/41553#discussion_r786284208
That is, the ask here is that Node.js should prefer Uint8Array
s over Buffer
s in new APIs.
What does everyone think? Should we stick to Buffer
(which is a subclass of Uint8Array
as a reminder) or prefer Uint8Array
s over buffers when possible in new APIs?
cc @nodejs/buffer @nodejs/streams
That will cause soo much confusion. I believe there are several methods/props that Buffer override and which act differently than Uint8Array.
In particular .slice
work differently. Not sure if there are others.
I'm fine with using Uint8Array for Web apis which define the type. But for node api's I think we should stick with Buffer.
It really depends on the subsystems we are targeting. It's impossible to make a generic call.
I have a big problem with Buffers.
-
buffer.slice()
is mutable copy,uint8array.slice()
is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to beinstanceof Uint8Array
and didarray.slice()
, then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? https://github.com/paulmillr/noble-ed25519/issues/45 - They are not supported in browsers and will never be. Adding an additional in-browser shim for buffers is bad.
- They expose private information to a global variable. Imagine you're developing some secure software. You reason about zeroing data etc. https://github.com/nodejs/node/issues/41467
// Somewhere in your code
const privateBuf = Buffer.from(privateKey, 'hex');
// Rogue package can access
Buffer.from('1').buffer
// Which will of course show the contents of `privateBuf`
// No need in complex memory dumps!
This happens because there is 8KB shared buffer reused for all Buffer.from
calls! There is zero need in making this as subtle as it is right now. Buffer.allocUnsafe
seems like a good name for "using a part of global shared buffer", Buffer.from
is not. Search GitHub code for the snippet and tell me how many people know about the "feature".
buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? Regression in 1.5.0 paulmillr/noble-ed25519#45
FWIW: I agree having an API that behaves like a subclass but "lies" about keeping the same API structure is super-confusing.
What about the following suggestion:
- Old modules that use
Buffer
will useBuffer
(for example readable streams that build heavily on them). - New modules and APIs will prefer
Uint8Array
s whenever possible for example the new HTTP API (if that's fetch that's easy). - Every API that takes a
Buffer
will accept aUint8Array
(that might already be the case). - Every API that returns a
Buffer
will continue returning aBuffer
to not break the ecosystem (stuff like.slice
).
Less likely to reach consensus but I'd still like that:
- "Soft deprecate"
buffer.slice
and recommend.subarray()
much more strongly.
We will never be able to get rid of Buffer. The generic rules of thumb I have in mind are:
-
unless an API specifically calls for the unique features/differences of Buffer, it should return Uint8Array instead. ( Also keep in mind that it's trivial to create a Buffer from a Uint8Array. e.g.
Buffer.from(u8.buffer)
. ) -
Any API that accepts a Buffer should also accept Uint8Array (that is, Buffer should never be required as the only option) and docs should reflect that Uint8Array is accepted.
Examples where Buffer may be needed:
- The API needs to work with hex or base64 encoded data.
- The API needs Buffer's idea of slice
- The code path is particularly performance sensitive and Buffer's pooling/uninitialized memory is needed
"Soft deprecate" buffer.slice and recommend .subarray() much more strongly.
Yes please :)
- Also keep in mind that it's trivial to create a Buffer from a Uint8Array. e.g.
Buffer.from(u8.buffer)
.
This is a great example to show that it’s not actually trivial, because Buffer.from(u8.buffer)
works 95 % of the time, and will do the absolute wrong thing the other 5 % of the time. And unfortunately, this has become a fairly common bug to encounter. :confused: Maybe we should add something like Buffer.fromView(abv: ArrayBufferView)
as a shorthand for Buffer.from(abv.buffer, abv.byteOffset, abv.byteLength)
?
Any API that accepts a Buffer should also accept Uint8Array (that is, Buffer should never be required as the only option) and docs should reflect that Uint8Array is accepted.
That’s also the current state of things – even Buffer.prototype
’s own methods work when called on Uint8Array
s.
This is a great example to show that it’s not actually trivial, because Buffer.from(u8.buffer) works 95 % of the time, and will do the absolute wrong thing the other 5 % of the time.
I ran into that as well (even though I knew about the bug before) so I'd like to echo it's a problem.
That’s also the current state of things – even Buffer.prototype’s own methods work when called on Uint8Arrays.
That's neat I didn't know that.
...even Buffer.prototype's own methods work when called on Uint8Arrays.
Sadly not in all cases...
@jasnell That’s a bug then :)
Examples where Buffer may be needed:
The API needs to work with hex or base64 encoded data.
That needs to be implemented as a hex/base64 decoder in stdlib, no need in using Buffers for this. The new stdlib decoders will work with Uint8arrays etc.
The API needs Buffer's idea of slice
Which is the same as .subarray()
The code path is particularly performance sensitive and Buffer's pooling/uninitialized memory is needed
Pooling can be easily done with Uint8Arrays. And it could be implemented in a way which does not expose private contents to a global variable. I'm doing this in https://github.com/paulmillr/noble-hashes (see src/sha256.ts).
@paulmillr to address some of the comment status:
buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? Regression in 1.5.0 paulmillr/noble-ed25519#45
We are (likely) going to docs-depercate (and --pending-deprecations
probably) buffer.slice()
(which effectively means everyone using TypeScript type definitions will see get warnings and suggestions to not use it). It's not great but it's a start. Progress here: https://github.com/nodejs/node/pull/41596
They are not supported in browsers and will never be. Adding an additional in-browser shim for buffers is bad.
The ask here is that any API that aims to be compatible with web APIs (for example fetch
or future modules) should strongly prefer typed arrays to buffers right? I think that's also a good point.
They expose private information to a global variable. Imagine you're developing some secure software. You reason about zeroing data etc. Buffer.from hex creates 8Kb buffers instead of 32 bytes #41467
I can see why this is done but this is very unfortunate and I did not realize this before. I'm wondering if this can be fixed while keeping the pooling behaviour? (By overriding .buffer
)
(and similarly - this is very much an area Node could use more people involved and more contributions in!)
I'm agaisnt the idea. It will not only cause confusion and compatibility issues, but also decrease in (micro) performance
I'm a bit against using node:buffer
myself for cross compatible reasons so I'm 👍 for this!
I think it have always been bad that .slice()
have been overridden to do what you don't expect from an Uint8Array-like object, there are cases where you would expect the array to truly be immutable. The Blob class for example demands that you copy the data so that it can't be changed later.
.slice()
is suppose to copy the data, period. currently the only rellyable way to do it is directly on the buffer.buffer.slice
which is a bit annoying to use
If the user really intend to not copy it then they should actually be using subarray!
I'm comparing this a bit to this article: https://jakearchibald.com/2021/function-callback-risks/
At some point somebody might just say: screw it: we are going to switch to using Uint8Array instead (cuz we already treat it as Uint8Array-like) so it can change the hole underlying infrastructure when it uses slice
I don't think that extending any built in types are such a great idea, what if buffer added a custom .at()
function in the early days when we did not have native at?
I think Buffer is bloated with stuff that we don't actually need (specially for browser) TextEncoder/TextDecoder should be used instead, it also runs much faster in browser (up to 10x faster - using this comparison) on some random web page they also handle BOM DataView can replace many things to read/edit the array.
I even wonder why Blob exist on the buffer object at all... should the buffer package on npm have to add a Blob class and whatwg streams as well?
Maybe it would be a good idea to fleshed out all of the utility method like to/from hex and base64 got added into a own sperate util module you could import from import { toHex } from node:binary-utility
We also have this binary-encoding proposal and some even argue against this proposal and string to/from binary encoding (worth reading)
honestly, i just wish to see Buffer being deprecated and gone all together. but that is likely never going to happen
I'm having a somewhat related problem with Buffer
. I need to convert Uint8Array
to a Base64 string, but this functionality is tied to the Buffer
class. So I need to first convert Uint8Array
to Buffer
and then to call Buffer.toString('base64')
. This extra operation just destroys all the performance benefits of using the native implementation of the Base64 encoding. We have to use our own function for encoding that operates on Uint8Array
directly, which is kinda ridiculous :)
I wrote a blog post on my intent to move away from Buffer
in all my packages: https://sindresorhus.com/blog/goodbye-nodejs-buffer
Examples where Buffer may be needed: The API needs to work with hex or base64 encoded data.
It looks like it's being worked on: https://github.com/tc39/proposal-arraybuffer-base64
I think the first step would be to add a note here about preferring Uint8Array
for new code. Would that be acceptable?
@sindresorhus I believe node.js is too bloated to make such decisions.
Bun replicated most hard parts of node in just a few engineer-years.
I wish that buffer never got added to the global scope in the first place. It had led to many bundler polyfilling it when it is seen as optional.
I can only hope that a new spec'ed web worker dose not include globalThis.buffer globalThis.process and globalThis.global by default. It should be explicit imported when needed
I think the first step would be to add a note here about preferring Uint8Array for new code. Would that be acceptable?
I don't believe we have consensus on that at all.
It would be quite a low effort to create a PR with that note, and probably the best way to seek consensus.
I don't think there is a good alternative for Buffer
at the moment. Uint8Array
doesn't provide all the functionality. That being said, in cases where it's not necessary and the instance is only accessible by node internals I do think it would be good to move to Uint8Array
.
https://github.com/sindresorhus/uint8array-extras does not provide helpers with the same performance.
TBH I'm not sure I understand why this issue is still open. Uint8Array
is already supposed to be supported everywhere Buffer
is (in core).
@benjamin when you say "the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs." what do you mean by "prefer"? Is it about the type of the value returned by new APIs?
@ronag it's easy to create a Buffer
from an Uint8Array
if necessary for a certain use case.
I don't think there is a good alternative for
Buffer
at the moment.Uint8Array
doesn't provide all the functionality. That being said, in cases where it's not necessary and the instance is only accessible by node internals I do think it would be good to move toUint8Array
.https://github.com/sindresorhus/uint8array-extras does not provide helpers with the same performance.
I would be in favor of this package, if it meant that i didn't need to ship node:buffer to browser and other runtimes. even if it meant a perf loss.
TBH I'm not sure I understand why this issue is still open.
Uint8Array
is already supposed to be supported everywhereBuffer
is (in core).
Maybe stands as a reminder that newly developed api's should prefer to return Uint8Array instead?
@benjamin when you say "the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs." what do you mean by "prefer"? Is it about the type of the value returned by new APIs?
Can only speak for myself, but i guess he means that ppl should avoid spending time on reading up on the Buffer class and all stuff in it and resort to using Uint8Array/DataView/TextEncoder/Decoder instead as it's better cross compatible with other env. not about changing the return type of what new api's should return.
it's quite funny how atob/btoa and this contradicts each other. this says avoid buffer and atob/btoa suggest that you should use Buffer -
i think this statment should be removed or changed:
Use buf.toString('base64') instead
Use Buffer.from(data, 'base64') instead
@jimmywarting I don't think it's appropriate to open the atob
/btoa
can of worms yet again. We already discussed this at length, see https://github.com/nodejs/node/issues/40754#issuecomment-1546914616 for a summary. Buffer
was added because there was no appropriate web API at the time, and while there is Uint8Array
now, there's still no good standardized API for base64 conversion.
@jimmywarting I don't think it's appropriate to open the
atob
/btoa
can of worms yet again. We already discussed this at length, see #40754 (comment) for a summary.Buffer
was added because there was no appropriate web API at the time, and while there isUint8Array
now, there's still no good standardized API for base64 conversion.
+
hex and url-safe base64