undici
undici copied to clipboard
fetch() performance parity
This would solve...
Undici fetch isn't as fast as the ones in browsers like Firefox / Deno / Workers.
The implementation should look like...
Measure and improve (e2e latencies, op/sec, conns/sec, etc?) to match up with other impls.
I have also considered...
I hand-rolled a http2 nodejs stdlib client and it was a magnitude faster than undici fetch, even without connection pooling.
I must note though, the conns were to Cloudflare and Google endpoints, which responded with 1KB or less per fetch (basically, POST and uncached GET DNS over HTTPS connections).
Additional context
This is a follow up from: https://news.ycombinator.com/item?id=30164925
Sorry that I didn't really benchmark the difference between Deno / Firefox / Undici / Cloudflare Workers, but I knew instantly that it was the cause of higher latencies when I deployed the same code in Node v16 LTS.
Possibly related: #22, #208, #952, #902
Other projects: szmarczak/http2-wrapper/issues/71
Do you have a reproducible benchmark we can use?
Undici fetch isn't as fast as the ones in browsers like Firefox / Deno / Workers.
I took a look at the comment in hackernews and the comment was not including benchmarks either.
I'm prone to close this unless somebody would like to prepare a cross-runtime benchmark suite.
Fair. I'll take a stab at extracting out the parts from my server to demonstrate this. I have to note that the slowness I observed was in a production env (running in a VM in a container behind a network load balancer) that also sees a tonne of other IO.
I wrote up a tiny DoH load-tester with python and golang (don't ask) which I intend to publish as a github action by tomorrow. In the meanwhile, here's a run from my laptop:
| deno/runs:3748 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| percentile | p10 | p50 | p75 | p90 | p95 | p99 | p99.9 | p99.99 | p100 |
| milliseconds | 280 | 296 | 309 | 336 | 411 | 694 | 889 | 898 | 929 |
| undici/runs:3826 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| percentile | p10 | p50 | p75 | p90 | p95 | p99 | p99.9 | p99.99 | p100 |
| milliseconds | 1140 | 1666 | 2171 | 2811 | 3302 | 3941 | 4501 | 4540 | 4995 |
| node:naive-h2/runs:3791 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| percentile | p10 | p50 | p75 | p90 | p95 | p99 | p99.9 | p99.99 | p100 |
| milliseconds | 1713 | 2067 | 2570 | 3276 | 3975 | 4269 | 4336 | 4530 | 4534 |
After fighting a bit with github-actions, a poor man's profiler on is finally up:
From the current run, here's Node:
| node/runs:1200 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| percentile | p10 | p50 | p75 | p90 | p95 | p99 | p99.9 | p99.99 | p100 |
| milliseconds | 532 | 1215 | 1665 | 3359 | 3469 | 20953 | 21094 | 21125 | 21182 |
| deno/runs:6300 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| percentile | p10 | p50 | p75 | p90 | p95 | p99 | p99.9 | p99.99 | p100 |
| milliseconds | 87 | 97 | 104 | 111 | 116 | 127 | 165 | 184 | 189 |
These are measured from Date.now wrapping around the fetch call. Both local and remote caches are unlikely to interfere because every DoH query is of form <random-str>.dnsleaktest.com.
Anything in the serverless-dns code-base that may significantly interfere with timing fetch calls is disabled.
The profiler forwards as many DoH requests it can for 50s to 60s, after which it quits.
Deno's serveDoh doesn't slow down at all, and blazes through 6300+ requests. Node's serverHTTPS (+ undici), otoh, manages 1200+ requests. I must note that the final batch of 100 queries served after the ~1100th request, took 20s+ to complete.
Just a note: I haven't used Node's built-in perf-module to profile since that is un-importable on Deno (from what I can tell)... hand-rolled a minimal and inefficient perf-counter on my own, instead.
I might be a bit lost on your codebase, but I could not find undici being used anywhere.
I would note that there are plenty of things happening on that module. How have you identified the problem in undici fetch implementation?
If by any chance you are testing the http/2 implementation, consider that https://github.com/serverless-dns/serverless-dns/blob/b36a9cdb5786b78a7dd8e22a2f0a767e9e340ab1/src/plugins/dns-operation/dnsResolver.js#L398 is highly inefficient as you will be incurring in significant overhead of creating those http/2 connections. You have to use a connection pool.
I might be a bit lost on your codebase, but I could not find undici being used anywhere.
Dang, we moved to node-fetch (to see if it was any better) and didn't revert back to undici (which we now have).
Here's results from Node + undici run:
| node/undici/runs:1300 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| percentile | p10 | p50 | p75 | p90 | p95 | p99 | p99.9 | p99.99 | p100 |
| milliseconds | 200 | 602 | 965 | 1121 | 3083 | 12267 | 20400 | 20407 | 20467 |
How have you identified the problem in undici fetch implementation?
What's measured is just this method dnsResolver.js:resolveDnsUpstream, which doesn't do much other than construct a Request and call cache-api (does not exist on Node, disabled either way on these "profiler" runs), fetch (over undici on Node), or doh2 (unused).
We then rolled our own DNS over UDP / DNS over TCP connection-pooled transport atop Node (dgram and net), and that showed considerable perf improvement (around 15x to 100x):
| node/udp/runs:6600 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| percentile | p10 | p50 | p75 | p90 | p95 | p99 | p99.9 | p99.99 | p100 |
| milliseconds | 17 | 18 | 19 | 20 | 21 | 25 | 34 | 89 | 131 |
Of course, a standalone benchmark would be great to have, but I wanted to demonstrate the issue I spot with the kind of workloads we specifically run (lots and lots of DoH requests), and that, by bypassing undici / node-fetch polyfills (or running on cloudflare-workers or deno which bring their own fetch impl to the table) from the code-path meant apparent improvements in IO.
I don't think anyone had performed an optimization of either fetch or node WHATWG Streams implementation, so this is a good point and there is likely a lot of margin there.
Have you tried using undici.request() instead?
Are there some instructions on how to run the benchmarks? Being able to get some good diagnostics would definitely help.
Here is a bit of tests from our own benchmarks:
│ Tests │ Samples │ Result │ Tolerance │ Difference with slowest │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - fetch │ 20 │ 1028.31 req/sec │ ± 2.71 % │ - │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ http - no keepalive │ 10 │ 3891.51 req/sec │ ± 2.00 % │ + 278.44 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - pipeline │ 95 │ 6034.47 req/sec │ ± 2.95 % │ + 486.83 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ http - keepalive │ 50 │ 6382.57 req/sec │ ± 2.98 % │ + 520.68 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - request │ 15 │ 8528.35 req/sec │ ± 2.11 % │ + 729.35 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - stream │ 10 │ 10226.33 req/sec │ ± 2.66 % │ + 894.48 % │
|─────────────────────|─────────|──────────────────|───────────|─────────────────────────|
│ undici - dispatch │ 50 │ 11399.31 req/sec │ ± 2.79 % │ + 1008.54 % │
There is likely a lot of room for improvements.
Have you tried using undici.request() instead?
I will do so over the coming days and report back.
Are there some instructions on how to run the benchmarks?
Apart from running the profiler.yml gh-action (of your own fork of serverless-dns), you could checkout the codebase, and then
# clone serverless-dns
git clone https://github.com/serverless-dns/serverless-dns.git
# download deps
npm i
# install q, a golang doh client
# https://github.com/natesales/q
echo "deb [trusted=yes] https://repo.natesales.net/apt /" | sudo tee /etc/apt/sources.list.d/natesales.list > /dev/null
sudo apt-get update
sudo apt-get install q
# export QDOH, where the 'run' script goes looking for `q`
export QDOH=q
# install node v15+ / deno v1.17+, if missing
...
# doh query, <random-str>.dnsleaktest.com, is sent to cloudflare-dns.com
# run doh, on node + undici
timeout 60s ./run node p1
# doh query, <random-str>.dnsleaktest.com, is sent to cloudflare-dns.com
# run doh, on deno
timeout 60s ./run deno p1
# dns query, <random-str>.dnsleaktest.com, is sent to 1.1.1.1
# run udp/tcp dns, on node
timeout 60s ./run node p3
Here is a bit of tests from our own benchmarks:
Thanks. Sorry, if this is a noob question, but: how do I generate such logs myself for my tests?
Thanks. Sorry, if this is a noob question, but: how do I generate such logs myself for my tests?
Check out https://github.com/nodejs/undici/pull/1214 for the fetch benchmarks.
You can run our benchmarks by running:
PORT=3001 node benchmarks/server.js
PORT=3001 node benchmarks/benchmarks.js
Hello folks! Recently, I spent some time working on the fetch performance investigation and I decided to create a report about my findings. I'll share an abstract (probably the full report will become a blog post at some moment)
TL;DR:
After some analysis, I came up with a few PRs improving fetch performance directly:
- https://github.com/nodejs/undici/pull/1303
- https://github.com/nodejs/undici/pull/1309
- https://github.com/nodejs/undici/pull/1343
However, the major bottleneck is under WebStreams implementation in Node.js core, thus,
I don't think there is a clear path for undici.fetch to improve its performance significantly without touching Node.js internals.
Firstly, the benchmark was unfair with undici.fetch since it was allocating the server response in a variable causing an event-loop block due to GC runs. Then, I solved it in: https://github.com/nodejs/undici/pull/1283.
After several analyses, I have found that WebStreams is very slow (through microbenchmark and by fetch benchmark) - For reference, see https://github.com/RafaelGSS/nodejs-bench-operations/blob/main/RESULTS-v17.md#stream-readablejs. Thus, I did a comparison replacing WebStreams with native Streams and I got 60% ~ 90% of improvement.
Avoiding WebStreams in response
diff --git a/lib/fetch/index.js b/lib/fetch/index.js
index 9047976..778ad82 100644
--- a/lib/fetch/index.js
+++ b/lib/fetch/index.js
@@ -50,6 +50,7 @@ const {
const { kHeadersList } = require('../core/symbols')
const EE = require('events')
const { Readable, pipeline } = require('stream')
+const Readable2 = require('../api/readable')
const { isErrored, isReadable } = require('../core/util')
const { dataURLProcessor } = require('./dataURL')
const { kIsMockActive } = require('../mock/mock-symbols')
@@ -964,7 +965,7 @@ async function fetchFinale (fetchParams, response) {
})
// 4. Set response’s body to the result of piping response’s body through transformStream.
- response.body = { stream: response.body.stream.pipeThrough(transformStream) }
+ // response.body = { stream: response.body.stream.pipeThrough(transformStream) }
}
// 6. If fetchParams’s process response consume body is non-null, then:
@@ -1731,9 +1732,8 @@ async function httpNetworkFetch (
})()
}
+ const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
try {
- const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
-
const iterator = body[Symbol.asyncIterator]()
fetchParams.controller.next = () => iterator.next()
@@ -1797,7 +1797,7 @@ async function httpNetworkFetch (
// 17. Run these steps, but abort when the ongoing fetch is terminated:
// 1. Set response’s body to a new body whose stream is stream.
- response.body = { stream }
+ response.body = { stream: body }
// 2. If response is not a network error and request’s cache mode is
// not "no-store", then update response in httpCache for request.
@@ -1957,7 +1957,7 @@ async function httpNetworkFetch (
headers.append(key, val)
}
- this.body = new Readable({ read: resume })
+ this.body = new Readable2(resume, this.abort, headers.get('content-type'))
const decoders = []
If you apply the above git diff and profile the application, it will show createAbortError as one of the functions wasting more time on the CPU.

And, these AbortSignal are mostly created in WebStreams instance (even if the instance is not in use), e.g:


So, avoiding the AbortSignal creation through WebStreams bumps, even more, the performance (150% ~ 200% -- based on the initial benchmarks)
Avoiding AbortSignal
diff --git a/lib/fetch/index.js b/lib/fetch/index.js
index 1fbf29b..322e5ae 100644
--- a/lib/fetch/index.js
+++ b/lib/fetch/index.js
@@ -50,6 +50,7 @@ const {
const { kHeadersList } = require('../core/symbols')
const EE = require('events')
const { Readable, pipeline } = require('stream')
+const Readable2 = require('../api/readable')
const { isErrored, isReadable } = require('../core/util')
const { dataURLProcessor } = require('./dataURL')
const { kIsMockActive } = require('../mock/mock-symbols')
@@ -957,14 +958,14 @@ async function fetchFinale (fetchParams, response) {
// 3. Set up transformStream with transformAlgorithm set to identityTransformAlgorithm
// and flushAlgorithm set to processResponseEndOfBody.
- const transformStream = new TransformStream({
- start () {},
- transform: identityTransformAlgorithm,
- flush: processResponseEndOfBody
- })
+ // const transformStream = new TransformStream({
+ // start () {},
+ // transform: identityTransformAlgorithm,
+ // flush: processResponseEndOfBody
+ // })
// 4. Set response’s body to the result of piping response’s body through transformStream.
- response.body = { stream: response.body.stream.pipeThrough(transformStream) }
+ // response.body = { stream: response.body.stream.pipeThrough(transformStream) }
}
// 6. If fetchParams’s process response consume body is non-null, then:
@@ -1731,9 +1732,8 @@ async function httpNetworkFetch (
})()
}
+ const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
try {
- const { body, status, statusText, headersList } = await dispatch({ body: requestBody })
-
const iterator = body[Symbol.asyncIterator]()
fetchParams.controller.next = () => iterator.next()
@@ -1775,29 +1775,29 @@ async function httpNetworkFetch (
// 16. Set up stream with pullAlgorithm set to pullAlgorithm,
// cancelAlgorithm set to cancelAlgorithm, highWaterMark set to
// highWaterMark, and sizeAlgorithm set to sizeAlgorithm.
- if (!ReadableStream) {
- ReadableStream = require('stream/web').ReadableStream
- }
-
- const stream = new ReadableStream(
- {
- async start (controller) {
- fetchParams.controller.controller = controller
- },
- async pull (controller) {
- await pullAlgorithm(controller)
- },
- async cancel (reason) {
- await cancelAlgorithm(reason)
- }
- },
- { highWaterMark: 0 }
- )
+ // if (!ReadableStream) {
+ // ReadableStream = require('stream/web').ReadableStream
+ // }
+
+ // const stream = new ReadableStream(
+ // {
+ // async start (controller) {
+ // fetchParams.controller.controller = controller
+ // },
+ // async pull (controller) {
+ // await pullAlgorithm(controller)
+ // },
+ // async cancel (reason) {
+ // await cancelAlgorithm(reason)
+ // }
+ // },
+ // { highWaterMark: 0 }
+ // )
// 17. Run these steps, but abort when the ongoing fetch is terminated:
// 1. Set response’s body to a new body whose stream is stream.
- response.body = { stream }
+ response.body = { stream: body }
// 2. If response is not a network error and request’s cache mode is
// not "no-store", then update response in httpCache for request.
@@ -1870,10 +1870,10 @@ async function httpNetworkFetch (
fetchParams.controller.controller.enqueue(new Uint8Array(bytes))
// 8. If stream is errored, then terminate the ongoing fetch.
- if (isErrored(stream)) {
- fetchParams.controller.terminate()
- return
- }
+ // if (isErrored(stream)) {
+ // fetchParams.controller.terminate()
+ // return
+ // }
// 9. If stream doesn’t need more data ask the user agent to suspend
// the ongoing fetch.
@@ -1891,16 +1891,16 @@ async function httpNetworkFetch (
response.aborted = true
// 2. If stream is readable, error stream with an "AbortError" DOMException.
- if (isReadable(stream)) {
- fetchParams.controller.controller.error(new AbortError())
- }
+ // if (isReadable(stream)) {
+ // fetchParams.controller.controller.error(new AbortError())
+ // }
} else {
// 3. Otherwise, if stream is readable, error stream with a TypeError.
- if (isReadable(stream)) {
- fetchParams.controller.controller.error(new TypeError('terminated', {
- cause: reason instanceof Error ? reason : undefined
- }))
- }
+ // if (isReadable(stream)) {
+ // fetchParams.controller.controller.error(new TypeError('terminated', {
+ // cause: reason instanceof Error ? reason : undefined
+ // }))
+ // }
}
// 4. If connection uses HTTP/2, then transmit an RST_STREAM frame.
@@ -1957,7 +1957,7 @@ async function httpNetworkFetch (
headers.append(key, val)
}
- this.body = new Readable({ read: resume })
+ this.body = new Readable2(resume, this.abort, headers.get('content-type'))
const decoders = []
Then, I came up with avoiding AbortController (that creates the AbortSignal under the hood), and impressively, it bumped the performance by 20%!
Avoiding AbortController
diff --git a/lib/fetch/request.js b/lib/fetch/request.js
index 0f10e67..ae9c37c 100644
--- a/lib/fetch/request.js
+++ b/lib/fetch/request.js
@@ -29,9 +29,9 @@ let TransformStream
const kInit = Symbol('init')
-const requestFinalizer = new FinalizationRegistry(({ signal, abort }) => {
- signal.removeEventListener('abort', abort)
-})
+// const requestFinalizer = new FinalizationRegistry(({ signal, abort }) => {
+// signal.removeEventListener('abort', abort)
+// })
// https://fetch.spec.whatwg.org/#request-class
class Request {
@@ -355,7 +355,7 @@ class Request {
// 28. Set this’s signal to a new AbortSignal object with this’s relevant
// Realm.
- const ac = new AbortController()
+ const ac = { signal: { addEventListener: () => {} } }
this[kSignal] = ac.signal
this[kSignal][kRealm] = this[kRealm]
@@ -376,7 +376,7 @@ class Request {
} else {
const abort = () => ac.abort()
signal.addEventListener('abort', abort, { once: true })
- requestFinalizer.register(this, { signal, abort })
+ // requestFinalizer.register(this, { signal, abort })
}
}
@@ -741,7 +741,8 @@ class Request {
clonedRequestObject[kHeaders][kRealm] = this[kHeaders][kRealm]
// 4. Make clonedRequestObject’s signal follow this’s signal.
- const ac = new AbortController()
+ const ac = { signal: { addEventListener: () => {}, abort: () => {} } }
+ // const ac = new AbortController()
if (this.signal.aborted) {
ac.abort()
} else {
Additional Notes
In the first iteration, Clinic Doctor report says something is wrong with Active Handles

The Active Handles are dropping often, which is far from expected. For comparison reasons, look at the undici.request graph:

There is probably a bad configuration out there and I have found that undici.fetch is dropping many packages during the benchmark causing TCP RETRANSMISSION.
Basically, when a package is dropped in an active connection a re-transmission protocol is sent and when it happens frequently, the performance is affected.
TCP Retransmission using undici.fetch
root@rafaelgss-desktop:/usr/share/bcc/tools# ./tcpretrans -c
Tracing retransmits ... Hit Ctrl-C to end
^C
LADDR:LPORT RADDR:RPORT RETRANSMITS
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52390 99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52400 99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52398 99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52402 99
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52414 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52424 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52412 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52392 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52420 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52396 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52422 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52404 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52410 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52428 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52418 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52408 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52426 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52416 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52406 100
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52394 101
For comparison reasons, looks the retransmissions using
- http.request (keep alive)
root@rafaelgss-desktop:/usr/share/bcc/tools# ./tcpretrans -c
Tracing retransmits ... Hit Ctrl-C to end
^C
LADDR:LPORT RADDR:RPORT RETRANSMITS
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52526 1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52532 1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52506 1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52514 1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52524 1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52512 1
[::ffff:127.0.0.1]#3001 <-> [::ffff:127.0.0.1]#52518 1
undici.request
root@rafaelgss-desktop:/usr/share/bcc/tools# ./tcpretrans -c
Tracing retransmits ... Hit Ctrl-C to end
^C
EMPTY
LADDR:LPORT RADDR:RPORT RETRANSMITS
Unfortunately, I haven't found out why it's happening. I tried, I swear :P.
@benjamingr why is creating abort signals so expensive?
I would guess it is because createAbortSignal+makeTransferable does a lot of prototype mutation stuff and a call into c++ which is basically the bane of js engine performance.
i think we could move the logic of createAbortSignal into the AbortSignal constructor, sorta like this:
const kDoThing = Symbol('kDoThing');
function createAbortSignal(aborted, reason) {
return new AbortSignal(aborted, reason, kDoThing);
}
class AbortSignal extends EventTarget {
constructor(aborted, reason, thing) {
if (thing !== kDoThing) throw new ERR_ILLEGAL_CONSTRUCTOR();
super();
this[kAborted] = aborted;
this[kReason] = reason;
}
}
the makeTransferable part is harder cuz c++ but i feel like we could at least make it more static than what makeTransferable is doing, similar to what i did above. also seems weird to me that JSTransferable needs to exist at all, like we already have the api with kTransfer and kDeserialize and whatnot but I don't know all the specifics there.
@jasnell @addaleax
@devsnek I didn't mention the analysis I did in Node.js Webstreams (planning to open an issue in Node.js any time soon), but indeed, makeTransferable has something on it for sure.

makeTransferable is an ugly hack, unfortunately. I'd love to find a way to handle it more elegantly but we are fairly strictly limited by the way the structured clone algorithm works and the v8 implementation of it. I have some ideas on how to make improvements there.
also seems weird to me that JSTransferable needs to exist at all, like we already have the api with kTransfer and kDeserialize and whatnot but I don't know all the specifics there.
The way it works is this:
The structured clone algorithm only works by default with ordinary JavaScript objects with enumerable properties. Things like JavaScript classes are not supported. In order to make objects like WritableStream and ReadableStream consistently transferable per the spec is to treat them as "Host Objects". The JSTransferable class is exactly how we do that. (A Host object is a JavaScript object that is backed by a C++ class.).
When a Host object is passed into the structured clone algorithm implementation, v8 will call out to the serialization/deserialization delegate implementation that Node.js provides. Node.js' delegate implementation will check to see what kind of Host object it is. If the Host object is just a native c++ class that extends from BaseObject, then the c++ class will have functions on it for handling the serialization/deserialization at the native layer. If the Host object is a JSTransferable, that means that the serialization/deserialization is handled by JavaScript functions on the JavaScript wrapper object -- specifically, the [kTransfer], [kClone], [kDeserialize] functions. The delegate will as the JSTransferable host object to serialize itself, and it will do so by calling out to the JavaScript functions..... tl;dr: JSTransferable is how we allow a JavaScript class to act as a Host object, and the kTransfer/kDeserialize do the actual work of that.
But, we have a problem here. Classes like WritableStream and ReadableStream are defined by the standards as not extending from any other class and there are Web Platform Tests that verify that the implementation matches the Web IDL definition. Specifically, we can't have class WritableStream extends JSTransferable because that would violate the standard definition of WritableStream.
The makeTransferable() utility is a hack to get around that. Instead of creating a WritableStream that extends from JSTransferable, makeTransferable() creates an instance of JSTransferable and uses the WritableStream instance as its prototype. It effectively inverts the inheritance (that is, creates a JSTransferable that extends from WritableStream). This is a bit slow for several reasons that should be fairly obvious. Performance-wise, it's not great. But it allows us to implement the streams spec correctly.
Looking at this more, it seems like we just need to teach v8 how to recognize non-c++-backed "host" objects, and then we wouldn't need to add the JSTransferable superclass.
oop thank you for the writeup james
Yes, it would be ideal if the structured clone algorithm provided a way to allow arbitrary classes to define how they would be serialized / deserialized. I've raised this conversation with the WHATWG (https://github.com/whatwg/html/issues/7428) before and, unfortunately, it received a pretty chilly response.
An approach based on well-known Symbols (e.g. Symbol.transfer, Symbol.clone, Symbol.deserialize) would be useful, where v8 would recognize that objects that implement these should be passed to the serializer/deserializer delegate. There's some complexity on the deserialization end around how to make the deserializer aware of what classes it needs to handle but that's largely implementation detail.
Unfortunately, without these changes at either the standard (tc-39, whatwg) level or v8 level, JSTransferable will be a necessary evil.
I don't think we need to do anything at the standards level, if v8 just had an api to tag constructors with a private symbol or similar, it could forward them back to WriteHostObject when it sees them and then we could do whatever we want with them.
Yeah, doing it entirely within v8 is possible. My concern there would be interop. I would much prefer the mechanism to be standardized so that there's a better hope that it would work consistently across runtimes.
Within v8, the key limitation is that, inside v8's implementation, only C++-backed objects can be identified as host objects, and only host objects are passed into the delegate to be handled, and since a host object can be nested arbitrarily deep in object graphs, we have to depend on v8 identifying them as it walks the graph being serialized. If we can start there -- teaching v8 to recognize when a pure JavaScript object is to be treated as a host object -- then we've gone a long way towards solving the problem.
I've made a small benchmark comparing undici.fetch with node-fetch. It fetches fetches metadata for 1500 packages from npm and I do find undici being consistently slower. Average run:
$ node bench.js
undici: 3522ms
node-fetch: 2492ms
Hey!
So, I did an investigation on undici.fetch after the https://github.com/nodejs/node/pull/44048 and basically, we are still slow.
As an abstract, all the points raised in the first analysis are still valid (WebStreams, AbortController, TCP Retransmission). Apart from WebStreams, it seems the next issue of fetch is memory usage. Garbage Collection uses a considerable amount of time.
Furthermore in: https://nova-option-465.notion.site/Undici-Fetch-Performance-Analysis-v2-81113b868be44b7a8f4dad03372db1db
I wonder if the AbortController creation is really necessary when signal is not provided by the user. Seems wasteful to create it when it's never used.
Generally I thought AbortController is meant to always be created in user code, not internally inside a fetch implementation.
Generally I thought AbortController is meant to always be created in user code, not internally inside a fetch implementation.
An abort signal is created weather you want it or not, simple web test new Request('/').signal
An abort signal is created weather you want it or not, simple web test new Request('/').signal
It's possible to only create a signal when accessed, since signal is a getter. I'm unsure if this would improve performance though.
There isn't much we can do with it; it's public and used directly below where we create it. I assume this is also the case for most structures in the code.
There does seem to be a performance improvement from v18.10.0 -> v18.11.0, which is probably attributed to the AbortController changes.
v18.11.0:
╔══════════════╤═════════╤════════════════╤═══════════╗
║ Slower tests │ Samples │ Result │ Tolerance ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ global fetch │ 3500 │ 3116.63 op/sec │ ± 0.98 % ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ Fastest test │ Samples │ Result │ Tolerance ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ undici fetch │ 3500 │ 3149.77 op/sec │ ± 0.99 % ║
╚══════════════╧═════════╧════════════════╧═══════════╝
v18.10.0:
╔══════════════╤═════════╤════════════════╤═══════════╗
║ Slower tests │ Samples │ Result │ Tolerance ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ undici fetch │ 6000 │ 2911.89 op/sec │ ± 0.97 % ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ Fastest test │ Samples │ Result │ Tolerance ║
╟──────────────┼─────────┼────────────────┼───────────╢
║ global fetch │ 3500 │ 2998.01 op/sec │ ± 0.90 % ║
╚══════════════╧═════════╧════════════════╧═══════════╝