buffer: fix DoS vector in atob
I know the consensus is that this function shouldn't be optimized, but the O(n*70) worst case complexity of the loop isn't mentioned in the docs. In fact, the comments in the code don't even go into detail why the function is slow, it simply says it's "not optimized".
This is a DoS vector to the unaware user and irresponsible IMO. We only found this while running benchmarks for the browserify buffer module.
We are currently trying to remove our dependency on base64-js by using atob in both node and the browser. Performance in Chromium is exceptional and faster than anything we could write in javascript. Node is a different story.
The mitigation for this is absolutely trivial, and I'm scratching my head as to why this wasn't done in the first place.
Anyway, I could optimize it even further with a few more lines, but that's against the rules, so I just did the absolute bare minimum to make it more palatable to everyone here. Please reconsider.
is there already a benchmark?
Anyhow: legacy is not deprecated.
I didn't write a benchmark for the node.js codebase, but here are the benchmarks from feross/buffer:
Without this PR:
$ node perf/write-base64.js
BrowserBuffer#write(8192, "base64") x 2,770 ops/sec ±0.64% (85 runs sampled)
BrowserBuffer#toString(8192, "base64") x 17,933 ops/sec ±1.87% (89 runs sampled)
With this PR:
$ ~/node/out/Release/node perf/write-base64.js
BrowserBuffer#write(8192, "base64") x 37,795 ops/sec ±0.57% (93 runs sampled)
BrowserBuffer#toString(8192, "base64") x 21,842 ops/sec ±0.63% (93 runs sampled)
In other words, with this PR, atob is 13.6 times faster and btoa is 1.2 times faster.
Also, without this PR, the performance of atob is completely unpredictable. There could be a worst-case base64 string which triggers 70 iterations on the linear search (ArrayPrototypeIndexOf) on every character, which means the new implementation of the loop is actually 70 times faster in the best case. It's just 13.6 times faster in the common case (with randomized data).
@nodejs/performance
Should there be benchmarks? Maybe we should still consider the perf for btoa/atob.
I am fine with landing this (since it's an improvement) but no benchmarks.
(as a side note nice seeing you back!)
It would be useful to have some clarification.
@addaleax stated that these functions should not be used, and should therefore not be optimized.
It seems that this PR attempts to reverse an earlier decision. Am I correct?
(I am not expressing any opinion, I just want to understand.)
@addaleax stated that these functions should not be used, and should therefore not be optimized.
I really don't understand this logic. If it's never supposed to be used, why implement atob at all then? It's a liability to have it be 70 times slower than it needs to be. You might as well remove it at that point.
If it's truly the case that we're not allowed to optimize this function, then I will submit another PR which removes it, because it's irresponsible to have it in node.js given its performance issues (I still maintain that it is indeed a DoS vector).
As far as I know, atob is the only portable way to synchronously parse base64 in a web browser. Over the past several years, I've watched node.js go to great lengths to replicate web APIs so programmers have the ability to write code that runs in both node and the browser. I ask you:
- Is
EventTargetunoptimized in favor ofEventEmitter? - Is the Web Crypto API unoptimized in favor of
require('crypto')? - Is
TextEncoderunoptimized in favor ofstring_decoder?
The answer to all of this is no. So why is atob unoptimized in favor of Buffer.from(str, 'base64')?
I understand that Buffer is a saner way to decode base64, but the web APIs don't offer any synchronous options aside from atob. It's certainly an unnecessary function for node.js to have, but note that MDN does not consider atob "legacy" or deprecated in any way. It's a standardized web API.
I suppose we could just use Buffer and browserify everything which includes feross/buffer, but I would like to remind you that you are speaking to someone who is currently helping develop that library. In fact, it's the very reason I made this PR!
And for reference:
Here is deno's implementation of atob. It's calling out to an SIMD-optimized base64 decoding function.
Here is bun's implementation of atob. It's calling out to JSCore's native base64 decoder.
Basically every javascript environment has an absurdly optimized version of atob, except for node.js.
To say that I'm not allowed to make a trivial few-line change in order to make a function 70 times faster would be ridiculous. At that point I'm about ready to give up on node.js after my 14 years of using it, because it will be clear that node.js has adopted some very strange development philosophies: problems which Bun and Deno apparently do not have.
I'm +1 on landing this and further optimize atob
@anonrig, pleased to hear you say that. I'm going to rewrite everything as a branchless loop I think. I was going to leave it as-is, but the whole "no optimizations ever" thing being brought up got me all riled up.
(Before anyone gets angry at me... I stress that I did not oppose this PR nor do I plan to. There is no point in arguing with me about it, I am just trying to gain a better understanding. Just to make sure there is no misunderstanding I will also approve the PR after this comment.)
Here is what I think... The reason why we need to validate the string in JavaScript is because Buffer.from has effectively no error handling. It will just ignore any character it does not recognize.
> Buffer.from("aaéaa", 'base64').toString('latin1')
'i¦\x9A'
> Buffer.from("aaaa", 'base64').toString('latin1')
'i¦\x9A'
Yet the Buffer.from documentation says...
A TypeError will be thrown if string is not a string or another type appropriate for Buffer.from() variants.
I think that if Buffer.from was whatwg compliant (at the C++ level) this PR would not be needed. If this, this means that this issue could be revisited later.
Please comment if you think I am wrong.
For context: https://github.com/nodejs/performance/issues/128
@lemire, I apologize if my post sounded like it was directed at you. I didn't mean for it to be. It was more generalized frustration towards the "no optimization" philosophy.
@chjj I am not offended, I just wanted to make sure that there was no disagreement.
Here is a branchless version of the loop. I don't know if everyone is okay with that, but it works well and gives a slight speedup over the current code in this PR.
edit: This is the function I wanted to write, but anticipated that some people would be averse to it.
The new function will typically be faster when you anticipate that the result is valid, which should be often the case.
Like most other collaborators, I believe I've never stood in the way of optimizing this function, so I don't appreciate the hostility in this discussion. Anyway, ...
the
O(n*70)worst case complexity of the loop isn't mentioned in the docs.
If you really want to throw O in here, be aware that O(n*70) = O(n*100000) = O(n).
This is a DoS vector to the unaware user and irresponsible IMO.
We rarely consider linear runtimes to be DoS vectors. If we did, one could argue that using a slow programming language interpreter or a slow CPU would be a DoS vector. There are likely many features in Node.js whose worst-case runtime far exceeds (any linear function of) the ideal runtime, yet we don't automatically consider those DoS vectors.
(Also, actual security issues must be reported through HackerOne according to our security policy, and must not be disclosed through GitHub until they are resolved.)
I really don't understand this logic. If it's never supposed to be used, why implement
atobat all then? It's a liability to have it be 70 times slower than it needs to be. You might as well remove it at that point.
FWIW, I think most would agree with James and Anna that in an ideal world, nobody would be using this function, so optimization would be pointless, and in this non-ideal world, optimizing it encourages the use of an undesirable API. Still, the project's focus has shifted heavily towards interoperability and performance in recent years, and this PR doesn't add a new dependency like the other PR originally did, so it seems much more likely that the community will appreciate straightforward optimizations that don't add a ton of complexity.
If you really want to throw O in here, be aware that O(n70) = O(n100000) = O(n).
True. I suppose it's still just linear at the end of the day, but I don't know a simpler way to express it. That said, I think everyone here understands what I mean when I say O(n*70).
We rarely consider linear runtimes to be DoS vectors. If we did, one could argue that using a slow programming language interpreter or a slow CPU would be a DoS vector.
The pedantic focus on big-O semantics combined with this statement makes me worry. If there were 100,000 unnecessary iterations inside the loop, would you be okay with it? It's still just linear complexity: O(n*100000) = O(n), like you said.
Things like this are why I say I'm worried node.js has adopted some strange development philosophies. For a member of node.js to say and think this function is acceptable because "it's still just O(n)" is deeply concerning to me.
actual security issues
Again, the trivialization of this issue is concerning. Code like this would be absolutely unacceptable in my industry. People would get fired for it, it would come up in code audits, etc.
I don't appreciate the hostility in this discussion
I'll try to keep it to a minimum, but I'm the kind of guy who yells at the TV, so bear with me. It's nothing personal.
For reference, I plan to offer a solution later this year that should make this code unnecessary and greatly speed up the whole thing. The plan is to improve the base64 decoding in Buffer. With some luck, by the end of the year, Node will have the fastest decoder. :-)
See https://github.com/nodejs/node/pull/51670#issuecomment-1930884400
If there were 100,000 unnecessary iterations inside the loop, would you be okay with it?
Regardless of the constant you throw in, is the amortized/worst-case runtime not in the same order of magnitude as the average runtime? Don't get me wrong, that constant might be unacceptable from a performance point of view, but I fail to see how this qualifies as a DoS vector. Pretty much none of the string encoding and decoding functions in Node.js provide any constant-time guarantees for fixed input sizes.
Things like this are why I say I'm worried node.js has adopted some strange development philosophies. For a member of node.js to say and think this function is acceptable because "it's still just O(n)" is deeply concerning to me.
I am sorry to somehow cause you concern for the project as a whole, but I did not say that there are no problems with this function or its performance, nor did I say that this function is acceptable. I merely questioned the purported DoS vector.
Just because they're not optimized doesn't mean they're not used, the idea of not optimizing them to dissuade use was understandable and agreed upon at the time but as @tniessen said, the general approach has shifted slightly to support interoperability.
These functions are the only interoperable way to do base64 en/decoding (at least until https://github.com/tc39/proposal-arraybuffer-base64 progresses) and so they'll naturally be used by universal javascript code and I wouldn't dare blocking performance improvements.
I wouldn't spend efforts on zealously squeezing out every last bit of performance given that a better-suited API is on the horizon but I am +1 on landing the low-hanging fruit.
If we decide to optimize these functions I would prefer we go with the approach in https://github.com/nodejs/node/pull/38433 and reuse code that's shared by other buffer methods, instead of creating another optimized implementation in JS land, especially if the reused C++ version is already faster.
Actually from a glance it seems the bottleneck comes from validation instead of actual encoding/decoding and https://github.com/nodejs/node/pull/38433 just rolls the validation into the actual encoding/decoding as well as skipping the intermediate Buffer, so theoratically that would already be faster & use less memory (i.e. less GC pressure) than only tweaking the validation routine in JS land.
I don't know why you bring up your industry, nor do I know much about your industry, but I do appreciate the irony of promoting proof-of-work cryptocurrencies while arguing about wasting CPU cycles in base64-decoding. Thank you, I needed that laugh.
I tried to keep the hostility to a minimum. I guess you weren't able to return the favor.
Nowhere in my post did I describe what my industry was or mention PoW cryptocurrencies. I don't know you, so I can only assume you e-stalked me (or you already know who I am?).
I'm not seeing any irony here. Are you implying that cycles are being "wasted" on PoW mining? If that's the case, I also owe you thanks. That was a good laugh.
Anyway, all I can gather from your post is that you've never written code intended to run in a truly adversarial environment.
If we decide to optimize these functions I would prefer we go with the approach in https://github.com/nodejs/node/pull/38433 and reuse code that's shared by other buffer methods, instead of creating another optimized implementation in JS land, especially if the reused C++ version is already faster.
@joyeecheung, agreed.
Should I close this if there is consensus around this?
Putting it on TSC agenda to discuss whether we have consensus on whether atob() and btoa() functions should be optimized.
@chjj I've self-moderated my earlier comment because it appears to derail the conversation, and will later hide this new comment as off-topic. I sincerely apologize if my remark about our opposing opinions on cryptocurrencies offended you.
I sincerely apologize if my remark about our opposing opinions on cryptocurrencies offended you.
@tniessen, no apology necessary. It's perfectly okay to disagree with my opinions on PoW. Many people do. But that discussion is probably better suited for twitter or something.
Let me know if you want to have at it on twitter. I'm always up for a PoW/PoS debate.
FYI the TC39 proposal for base64 support in Uint8Array just got to stage 3 this week https://github.com/tc39/proposal-arraybuffer-base64 this would probably be faster than any other alternatives if it's implemented in the JS engine.
The TC39 one won't be able to do string -> string conversions directly though, presumably adding significant overhead over an optimized atob, unless I'm missing something.
People who care a lot about the performance of these functions should contribute good benchmarks. That would be most useful.
The TC39 one won't be able to do string -> string conversions directly though, presumably adding significant overhead over an optimized atob, unless I'm missing something.
The current code seems to go from string to Buffer and back to string.
Buffer.from(input, 'base64').toString('latin1');
So it is not direct string to string. That might be a good idea... but we need benchmarks !!!
IIRC @joyeecheung had mentioned that V8 doesn't give us the ability to read and construct strings "directly", and since Node ships with a custom implementation of atob I guess V8 doesn't implement it internally, so the string to string idea should be unimplementable on Node's side 🤔