undici perf(fetch): avoid redundant conversion

I'm not done yet.

We understand that some behaviors deviate from the specification and increase maintenance costs, but these APIs should be improved.

Mar 01 '24 13:03 tsctx

I'm not maintaining the web stuff anymore but if I did, I would be against anything that deviates from spec due to maintenance overhead. @KhafraDev

Mar 01 '24 13:03 ronag

@KhafraDev I kept the changes to a minimum. Is this okay?

Mar 01 '24 14:03 tsctx

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.65%. Comparing base (3d9fcf5) to head (798d62e).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2894   +/-   ##
=======================================
  Coverage   93.65%   93.65%           
=======================================
  Files          87       87           
  Lines       23899    23908    +9     
=======================================
+ Hits        22383    22392    +9     
  Misses       1516     1516

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 01 '24 14:03 codecov-commenter

@KhafraDev https://github.com/tsctx/undici/commit/06e14d93e8e20a5c09f3e5cf6f44df9c748fe133 (The change is not reflected in the PR, see here.) This PR objective is, avoid using the redundant operation String.toWellFormed. When the passed string is passed to the text encoder api, invalid utf-8 characters are replaced and need not be used.

Mar 01 '24 14:03 tsctx

Removing these operations would increase performance by a factor of 1.5.

Mar 01 '24 15:03 tsctx

wow! This seems meaningful.

Mar 01 '24 17:03 mcollina

I would call it misleading. What is being improved by 1.5x? If it's instantiating Responses, the change is wrong (we have to convert the body to a USVString), and other changes are cold paths/indentation.

Mar 01 '24 18:03 KhafraDev

@KhafraDev This benchmark. https://github.com/oven-sh/bun/blob/main/bench/snippets/response-arrayBuffer.mjs

Mar 01 '24 21:03 tsctx

Ignore my previous comment, this will work, but in a not-so-obvious way. I'm not too sure if I like removing the explicit conversion to a USVString for the implicit conversion done in TextEncoder (and Buffer, as a side note), which took a little while to figure out. I would consider it working as a side-effect of how the body is extracted, and something that can easily break.

How the PR works now:

A response is created with a string body such as '\ud801'
The body is extracted in extractBody which returns a stream that when pulled from, encodes the body with a TextEncoder
The TextEncoder replaces lone surrogates with u+fffd

How it worked before:

A response is created with a string body such as '\ud801'
The webidl converter in its constructor replaced lone surrogates characters with u+fffd

It's a noticeable difference when hunting down through multiple files why it worked. I (may falsely) assume that the reason there was no comment here was because you were confused as to why it worked as well?

A test should be added to ensure that if we ever change the TextEncoder approach, lone surrogates are still replaced. Afterwards I don't have much of an issue landing this.


const response = new Response('\ud801')
assert.deepStrictEqual(await response.text(), '\ufffd')

Mar 01 '24 22:03 KhafraDev

I think it's worthwhile landing anyway, the speed bump is noticeable.

Does it affects anyhow the fetch() bench?

Mar 02 '24 07:03 mcollina

Does it affects anyhow the fetch() bench?

Benchmarks that retrieve data show no performance improvement. Performance is improved only when sending strings.

Benchmark added. #2905

Mar 02 '24 13:03 tsctx

Can you rebase with main to have the benchmarks?

Mar 03 '24 10:03 metcoder95

Is it ready for review? Should @KhafraDev review again?

Mar 05 '24 17:03 Uzlopak

undici undici copied to clipboard

perf(fetch): avoid redundant conversion

Codecov Report

undici
undici copied to clipboard