node icon indicating copy to clipboard operation
node copied to clipboard

src,lib,buffer: improve atob / btoa performance

Open XadillaX opened this issue 3 years ago • 10 comments

Extract modp_b64 dependence from Chromium to improve performance of atob() and btoa().

Refs: https://github.com/chromium/chromium/tree/92.0.4490.1/third_party/modp_b64

Benchmark

  • 12 Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  • MemTotal: 32657380 kB

Old logic

$ ./node benchmark/run.js --filter buffer-atob.js buffers

buffers/buffer-atob.js
buffers/buffer-atob.js n=32 len=65536: 487.7280162769471

$ ./node benchmark/run.js --filter buffer-btoa.js buffers

buffers/buffer-btoa.js
buffers/buffer-btoa.js n=32 len=67108864: 2.4594857665824494

Logic of this PR

$ ./node benchmark/run.js --filter buffer-atob.js buffers

buffers/buffer-atob.js
buffers/buffer-atob.js n=32 len=65536: 10,588.320156389489

$ ./node benchmark/run.js --filter buffer-btoa.js buffers

buffers/buffer-btoa.js
buffers/buffer-btoa.js n=32 len=67108864: 3.503510145848807

XadillaX avatar Apr 27 '21 10:04 XadillaX

Despite the performance boost, I'm generally -1 on introducing a new dependency and a different codepath for base64 encoding/decoding only for atob/btoa. If anything, we should be looking at improving the base64/base64url encoding performance for Buffer. /cc @addaleax

I think I can try to use modp b64 in Buffer's Base64 if it's really faster.

This dep is extracted from Chromium.

XadillaX avatar Apr 27 '21 14:04 XadillaX

According to https://github.com/nodejs/node/blob/master/doc/guides/contributing/pull-requests.md#dependencies policy, changes on deps/* should be sent to upstream first. In that case, as Node.js already have their own implementation of base64 functionality, adding a new dependency for base64 solely doesn't seem to be the best effort to reduce the burden of collaborators. While the base64 codebase isn't quite large, are there any improvements that are worth to be made on Node.js own base64 implementation?

legendecas avatar Apr 27 '21 16:04 legendecas

@addaleax @jasnell I've updated, removing modp_b64 and using base64_encode. And got the new benchmark:

$ ./node benchmark/run.js --filter buffer-btoa.js buffers

buffers/buffer-btoa.js
buffers/buffer-btoa.js n=32 len=67108864: 3.3501380480392817

$ ./node benchmark/run.js --filter buffer-atob.js buffers

buffers/buffer-atob.js
buffers/buffer-atob.js n=32 len=65536: 6,966.175950365997

And I've found that modp_b64's decoding is faster then code in Node.js'.

Maybe we can merge this PR first (if possible), and then use modp_b64 to instead of Node.js' current Base64 code.

XadillaX avatar Apr 28 '21 12:04 XadillaX

@XadillaX In order to merge this, I think we should:

  • Remove the C++-backed atob and btoa implementations
  • Remove their benchmarks
  • Undo the stylistic changes that are happening to unrelated code

addaleax avatar Apr 28 '21 12:04 addaleax

@XadillaX In order to merge this, I think we should:

  • Remove the C++-backed atob and btoa implementations
  • Remove their benchmarks
  • Undo the stylistic changes that are happening to unrelated code

The C++-backed implementations is just fasten the function to resolve:

// TODO(@jasnell): The implementation here has not been performance
// optimized in any way.

And this is not a large change, just call base64_encode directly to avoid do unnecessary Buffer call.

XadillaX avatar Apr 28 '21 12:04 XadillaX

The C++-backed implementations is just fasten the function to resolve:

Then let’s remove the TODO comment, or edit it to make clear that there’s not actually anything to do there.

Making these functions faster tells people that it’s okay to use them. It’s not, therefore let’s not worry about their performance at all.

addaleax avatar Apr 28 '21 12:04 addaleax

@addaleax I've tried to use modp_b64 to instead of current and found the performance is almost the same (https://github.com/nodejs/node/compare/c975dff3c0f0f1ecb1574f3b10dd1d135a7704db...XadillaX:base64-perf).

So the way to improve this PR is just to make atob() / btoa() using base64_encode / base64_decode directly.

But as you said before, there's no need to improve them.

That means the only thing I should do is to remove @jasnell's TODO comment?

/cc @jasnell

XadillaX avatar May 01 '21 15:05 XadillaX

We are thinking about using atob and btoa in https://github.com/feross/buffer/issues/339 for web compatibility. We have seen quite quickly that the node implementations are poorly optimised.

Although the buffer module is directed at browsers, if user code unintentionally finds it's way to using the buffer module in a node environment and subsequently ends up using atob or btoa, this is only a loss for users of node, not our library.

dcousens avatar Feb 05 '24 09:02 dcousens

Note that it is planned that simdutf could provide fast routines for base64 in the near future.

See https://github.com/nodejs/performance/issues/128

lemire avatar Feb 06 '24 22:02 lemire

The simdutf library (already part of Node.js) will have accelerated base64 support in the next release. Decoding will follow the WHATWG forgiving-base64 specification so that ASCII white spaces are allowed in base64 content.

https://github.com/simdutf/simdutf/pull/375

lemire avatar Mar 16 '24 18:03 lemire