node Start moving to Uint8Array in new APIs?

Start moving to Uint8Array in new APIs?

Open benjamingr opened this issue 2 years ago • 18 comments

There was a suggestion by @jasnell to use Uint8Arrays in new APIs over Buffers as well as a weigh-in by @sindresorhus saying it is easier to author cross-platform APIs when using Uint8Arrays.

Here is a context https://github.com/nodejs/node/pull/41553#discussion_r786284208

That is, the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs.

What does everyone think? Should we stick to Buffer (which is a subclass of Uint8Array as a reminder) or prefer Uint8Arrays over buffers when possible in new APIs?

cc @nodejs/buffer @nodejs/streams

Jan 19 '22 07:01 benjamingr

That will cause soo much confusion. I believe there are several methods/props that Buffer override and which act differently than Uint8Array.

Jan 19 '22 08:01 ronag

In particular .slice work differently. Not sure if there are others.

Jan 19 '22 08:01 ronag

I'm fine with using Uint8Array for Web apis which define the type. But for node api's I think we should stick with Buffer.

Jan 19 '22 08:01 ronag

It really depends on the subsystems we are targeting. It's impossible to make a generic call.

Jan 19 '22 08:01 mcollina

I have a big problem with Buffers.

buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? https://github.com/paulmillr/noble-ed25519/issues/45
They are not supported in browsers and will never be. Adding an additional in-browser shim for buffers is bad.
They expose private information to a global variable. Imagine you're developing some secure software. You reason about zeroing data etc. https://github.com/nodejs/node/issues/41467

// Somewhere in your code
const privateBuf = Buffer.from(privateKey, 'hex');

// Rogue package can access
Buffer.from('1').buffer
// Which will of course show the contents of `privateBuf`
// No need in complex memory dumps!

This happens because there is 8KB shared buffer reused for all Buffer.from calls! There is zero need in making this as subtle as it is right now. Buffer.allocUnsafe seems like a good name for "using a part of global shared buffer", Buffer.from is not. Search GitHub code for the snippet and tell me how many people know about the "feature".

Jan 19 '22 09:01 paulmillr

buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? Regression in 1.5.0 paulmillr/noble-ed25519#45

FWIW: I agree having an API that behaves like a subclass but "lies" about keeping the same API structure is super-confusing.

Jan 19 '22 10:01 benjamingr

What about the following suggestion:

Old modules that use Buffer will use Buffer (for example readable streams that build heavily on them).
New modules and APIs will prefer Uint8Arrays whenever possible for example the new HTTP API (if that's fetch that's easy).
Every API that takes a Buffer will accept a Uint8Array (that might already be the case).
Every API that returns a Buffer will continue returning a Buffer to not break the ecosystem (stuff like .slice).

Less likely to reach consensus but I'd still like that:

"Soft deprecate" buffer.slice and recommend .subarray() much more strongly.

Jan 19 '22 14:01 benjamingr

We will never be able to get rid of Buffer. The generic rules of thumb I have in mind are:

unless an API specifically calls for the unique features/differences of Buffer, it should return Uint8Array instead. ( Also keep in mind that it's trivial to create a Buffer from a Uint8Array. e.g. Buffer.from(u8.buffer). )
Any API that accepts a Buffer should also accept Uint8Array (that is, Buffer should never be required as the only option) and docs should reflect that Uint8Array is accepted.

Examples where Buffer may be needed:

The API needs to work with hex or base64 encoded data.
The API needs Buffer's idea of slice
The code path is particularly performance sensitive and Buffer's pooling/uninitialized memory is needed

Jan 19 '22 14:01 jasnell

"Soft deprecate" buffer.slice and recommend .subarray() much more strongly.

Yes please :)

Also keep in mind that it's trivial to create a Buffer from a Uint8Array. e.g. Buffer.from(u8.buffer).

This is a great example to show that it’s not actually trivial, because Buffer.from(u8.buffer) works 95 % of the time, and will do the absolute wrong thing the other 5 % of the time. And unfortunately, this has become a fairly common bug to encounter. :confused: Maybe we should add something like Buffer.fromView(abv: ArrayBufferView) as a shorthand for Buffer.from(abv.buffer, abv.byteOffset, abv.byteLength)?

Any API that accepts a Buffer should also accept Uint8Array (that is, Buffer should never be required as the only option) and docs should reflect that Uint8Array is accepted.

That’s also the current state of things – even Buffer.prototype’s own methods work when called on Uint8Arrays.

Jan 19 '22 15:01 addaleax

This is a great example to show that it’s not actually trivial, because Buffer.from(u8.buffer) works 95 % of the time, and will do the absolute wrong thing the other 5 % of the time.

I ran into that as well (even though I knew about the bug before) so I'd like to echo it's a problem.

That’s also the current state of things – even Buffer.prototype’s own methods work when called on Uint8Arrays.

That's neat I didn't know that.

Jan 19 '22 16:01 benjamingr

...even Buffer.prototype's own methods work when called on Uint8Arrays.

Sadly not in all cases...

Jan 19 '22 16:01 jasnell

@jasnell That’s a bug then :)

Jan 19 '22 17:01 addaleax

Examples where Buffer may be needed:

The API needs to work with hex or base64 encoded data.

That needs to be implemented as a hex/base64 decoder in stdlib, no need in using Buffers for this. The new stdlib decoders will work with Uint8arrays etc.

The API needs Buffer's idea of slice

Which is the same as .subarray()

The code path is particularly performance sensitive and Buffer's pooling/uninitialized memory is needed

Pooling can be easily done with Uint8Arrays. And it could be implemented in a way which does not expose private contents to a global variable. I'm doing this in https://github.com/paulmillr/noble-hashes (see src/sha256.ts).

Jan 19 '22 23:01 paulmillr

@paulmillr to address some of the comment status:

buffer.slice() is mutable copy, uint8array.slice() is immutable. This is very bad. Recently got hit with a bug report. I was checking for the input to be instanceof Uint8Array and did array.slice(), then operated on the copy. But that doesn't work with buffers! The buffers were mutated even after copy. Why are they instance of Uint8Arrays if the behavior is different? Regression in 1.5.0 paulmillr/noble-ed25519#45

We are (likely) going to docs-depercate (and --pending-deprecations probably) buffer.slice() (which effectively means everyone using TypeScript type definitions will see get warnings and suggestions to not use it). It's not great but it's a start. Progress here: https://github.com/nodejs/node/pull/41596

They are not supported in browsers and will never be. Adding an additional in-browser shim for buffers is bad.

The ask here is that any API that aims to be compatible with web APIs (for example fetch or future modules) should strongly prefer typed arrays to buffers right? I think that's also a good point.

They expose private information to a global variable. Imagine you're developing some secure software. You reason about zeroing data etc. Buffer.from hex creates 8Kb buffers instead of 32 bytes #41467

I can see why this is done but this is very unfortunate and I did not realize this before. I'm wondering if this can be fixed while keeping the pooling behaviour? (By overriding .buffer)

Jan 20 '22 16:01 benjamingr

(and similarly - this is very much an area Node could use more people involved and more contributions in!)

Jan 20 '22 16:01 benjamingr

I'm agaisnt the idea. It will not only cause confusion and compatibility issues, but also decrease in (micro) performance

Jan 20 '22 23:01 peq42

I'm a bit against using node:buffer myself for cross compatible reasons so I'm 👍 for this! I think it have always been bad that .slice() have been overridden to do what you don't expect from an Uint8Array-like object, there are cases where you would expect the array to truly be immutable. The Blob class for example demands that you copy the data so that it can't be changed later.

.slice() is suppose to copy the data, period. currently the only rellyable way to do it is directly on the buffer.buffer.slice which is a bit annoying to use If the user really intend to not copy it then they should actually be using subarray!

I'm comparing this a bit to this article: https://jakearchibald.com/2021/function-callback-risks/

At some point somebody might just say: screw it: we are going to switch to using Uint8Array instead (cuz we already treat it as Uint8Array-like) so it can change the hole underlying infrastructure when it uses slice

I don't think that extending any built in types are such a great idea, what if buffer added a custom .at() function in the early days when we did not have native at?

I think Buffer is bloated with stuff that we don't actually need (specially for browser) TextEncoder/TextDecoder should be used instead, it also runs much faster in browser (up to 10x faster - using this comparison) on some random web page they also handle BOM DataView can replace many things to read/edit the array.

I even wonder why Blob exist on the buffer object at all... should the buffer package on npm have to add a Blob class and whatwg streams as well?

Maybe it would be a good idea to fleshed out all of the utility method like to/from hex and base64 got added into a own sperate util module you could import from import { toHex } from node:binary-utility We also have this binary-encoding proposal and some even argue against this proposal and string to/from binary encoding (worth reading)

honestly, i just wish to see Buffer being deprecated and gone all together. but that is likely never going to happen

Mar 08 '22 19:03 jimmywarting

I'm having a somewhat related problem with Buffer. I need to convert Uint8Array to a Base64 string, but this functionality is tied to the Buffer class. So I need to first convert Uint8Array to Buffer and then to call Buffer.toString('base64'). This extra operation just destroys all the performance benefits of using the native implementation of the Base64 encoding. We have to use our own function for encoding that operates on Uint8Array directly, which is kinda ridiculous :)

Mar 26 '22 18:03 slavafomin

I wrote a blog post on my intent to move away from Buffer in all my packages: https://sindresorhus.com/blog/goodbye-nodejs-buffer

Oct 24 '23 13:10 sindresorhus

Examples where Buffer may be needed: The API needs to work with hex or base64 encoded data.

It looks like it's being worked on: https://github.com/tc39/proposal-arraybuffer-base64

Oct 24 '23 13:10 sindresorhus

I think the first step would be to add a note here about preferring Uint8Array for new code. Would that be acceptable?

Oct 24 '23 13:10 sindresorhus

@sindresorhus I believe node.js is too bloated to make such decisions.

Bun replicated most hard parts of node in just a few engineer-years.

Oct 24 '23 13:10 paulmillr

I wish that buffer never got added to the global scope in the first place. It had led to many bundler polyfilling it when it is seen as optional.

I can only hope that a new spec'ed web worker dose not include globalThis.buffer globalThis.process and globalThis.global by default. It should be explicit imported when needed

Oct 24 '23 15:10 jimmywarting

I think the first step would be to add a note here about preferring Uint8Array for new code. Would that be acceptable?

I don't believe we have consensus on that at all.

Oct 24 '23 16:10 benjamingr

It would be quite a low effort to create a PR with that note, and probably the best way to seek consensus.

Oct 24 '23 17:10 targos

I don't think there is a good alternative for Buffer at the moment. Uint8Array doesn't provide all the functionality. That being said, in cases where it's not necessary and the instance is only accessible by node internals I do think it would be good to move to Uint8Array.

https://github.com/sindresorhus/uint8array-extras does not provide helpers with the same performance.

Oct 24 '23 17:10 ronag

TBH I'm not sure I understand why this issue is still open. Uint8Array is already supposed to be supported everywhere Buffer is (in core).

@benjamin when you say "the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs." what do you mean by "prefer"? Is it about the type of the value returned by new APIs?

@ronag it's easy to create a Buffer from an Uint8Array if necessary for a certain use case.

Oct 24 '23 17:10 targos

I don't think there is a good alternative for Buffer at the moment. Uint8Array doesn't provide all the functionality. That being said, in cases where it's not necessary and the instance is only accessible by node internals I do think it would be good to move to Uint8Array.

https://github.com/sindresorhus/uint8array-extras does not provide helpers with the same performance.

I would be in favor of this package, if it meant that i didn't need to ship node:buffer to browser and other runtimes. even if it meant a perf loss.

TBH I'm not sure I understand why this issue is still open. Uint8Array is already supposed to be supported everywhere Buffer is (in core).

Maybe stands as a reminder that newly developed api's should prefer to return Uint8Array instead?

@benjamin when you say "the ask here is that Node.js should prefer Uint8Arrays over Buffers in new APIs." what do you mean by "prefer"? Is it about the type of the value returned by new APIs?

Can only speak for myself, but i guess he means that ppl should avoid spending time on reading up on the Buffer class and all stuff in it and resort to using Uint8Array/DataView/TextEncoder/Decoder instead as it's better cross compatible with other env. not about changing the return type of what new api's should return.

it's quite funny how atob/btoa and this contradicts each other. this says avoid buffer and atob/btoa suggest that you should use Buffer -

i think this statment should be removed or changed: Use buf.toString('base64') instead Use Buffer.from(data, 'base64') instead

Oct 24 '23 18:10 jimmywarting

@jimmywarting I don't think it's appropriate to open the atob/btoa can of worms yet again. We already discussed this at length, see https://github.com/nodejs/node/issues/40754#issuecomment-1546914616 for a summary. Buffer was added because there was no appropriate web API at the time, and while there is Uint8Array now, there's still no good standardized API for base64 conversion.

Oct 24 '23 21:10 tniessen

@jimmywarting I don't think it's appropriate to open the atob/btoa can of worms yet again. We already discussed this at length, see #40754 (comment) for a summary. Buffer was added because there was no appropriate web API at the time, and while there is Uint8Array now, there's still no good standardized API for base64 conversion.

+ hex and url-safe base64

Oct 25 '23 07:10 panva

node node copied to clipboard

Start moving to Uint8Array in new APIs?

node
node copied to clipboard