undici icon indicating copy to clipboard operation
undici copied to clipboard

"preprocessed" headers

Open ronag opened this issue 11 months ago • 8 comments

When using the same headers for multiple requests it would be nice to bypass much of the header validation and transformation logic we do:

i.e. an API like this should be possible:

const headers = new undici.Headers({ Foo: 'bar' })

for (let n = 0; n < 100; n++) {
  const res = await request(url, { headers })
  await res.dump()
}

Where undici.Headers guarantee that the headers are already in a valid state and we don't need to do anything further. Could even be pre computed into a headers string.

To start with I would makeundici.Headers immutable. Later we could add immutable composition, e.g:

const headers = new undici.Headers({ Foo: 'bar' })

for (let n = 0; n < 100; n++) {
  const res = await request(url, { headers: headers.assign({ Bar: 'foo' }) })
  await res.dump()
}

A quick start (just move logic from core/request constructor and make request access the state through symbols):

class Headers {
  constructor(headers) {
    if (Array.isArray(headers)) {
      if (headers.length % 2 !== 0) {
        throw new InvalidArgumentError('headers array must be even')
      }
      for (let i = 0; i < headers.length; i += 2) {
        processHeader(this, headers[i], headers[i + 1])
      }
    } else if (headers && typeof headers === 'object') {
      if (headers[Symbol.iterator]) {
        for (const header of headers) {
          if (!Array.isArray(header) || header.length !== 2) {
            throw new InvalidArgumentError('headers must be in key-value pair format')
          }
          processHeader(this, header[0], header[1])
        }
      } else {
        const keys = Object.keys(headers)
        for (let i = 0; i < keys.length; ++i) {
          processHeader(this, keys[i], headers[keys[i]])
        }
      }
    } else if (headers != null) {
      throw new InvalidArgumentError('headers must be an object or an array')
    }
  }
}


function processHeader (headers, key, val) {
  if (val && (typeof val === 'object' && !Array.isArray(val))) {
    throw new InvalidArgumentError(`invalid ${key} header`)
  } else if (val === undefined) {
    return
  }

  let headerName = headerNameLowerCasedRecord[key]

  if (headerName === undefined) {
    headerName = key.toLowerCase()
    if (headerNameLowerCasedRecord[headerName] === undefined && !isValidHTTPToken(headerName)) {
      throw new InvalidArgumentError('invalid header key')
    }
  }

  if (Array.isArray(val)) {
    const arr = []
    for (let i = 0; i < val.length; i++) {
      if (typeof val[i] === 'string') {
        if (!isValidHeaderValue(val[i])) {
          throw new InvalidArgumentError(`invalid ${key} header`)
        }
        arr.push(val[i])
      } else if (val[i] === null) {
        arr.push('')
      } else if (typeof val[i] === 'object') {
        throw new InvalidArgumentError(`invalid ${key} header`)
      } else {
        arr.push(`${val[i]}`)
      }
    }
    val = arr
  } else if (typeof val === 'string') {
    if (!isValidHeaderValue(val)) {
      throw new InvalidArgumentError(`invalid ${key} header`)
    }
  } else if (val === null) {
    val = ''
  } else {
    val = `${val}`
  }

  if (request.host === null && headerName === 'host') {
    if (typeof val !== 'string') {
      throw new InvalidArgumentError('invalid host header')
    }
    // Consumed by Client
    headers[kHost] = val
  } else if (request.contentLength === null && headerName === 'content-length') {
    headers[kContentLength] = parseInt(val, 10)
    if (!Number.isFinite(request.contentLength)) {
      throw new InvalidArgumentError('invalid content-length header')
    }
  } else if (request.contentType === null && headerName === 'content-type') {
    headers[kContentType] = val
    headers[kHeaders].push(key, val)
  } else if (headerName === 'transfer-encoding' || headerName === 'keep-alive' || headerName === 'upgrade') {
    throw new InvalidArgumentError(`invalid ${headerName} header`)
  } else if (headerName === 'connection') {
    const value = typeof val === 'string' ? val.toLowerCase() : null
    if (value !== 'close' && value !== 'keep-alive') {
      throw new InvalidArgumentError('invalid connection header')
    }

    if (value === 'close') {
      headers[kReset] = true
    }
  } else if (headerName === 'expect') {
    throw new NotSupportedError('expect header not supported')
  } else {
    request[kHeaders].push(key, val)
  }
}

Refs: https://github.com/nodejs/undici/issues/3994

ronag avatar Jan 11 '25 09:01 ronag

I would call it ImmutableRequestHeaders to not confuse it with the Headers object from the fetch spec.

mcollina avatar Jan 11 '25 10:01 mcollina

What do we need to solve this problem?

  1. Make a header structure and use it only inside the core
  2. When creating a core Request instance, immediately assemble the request and rewrite all the headers in advance (?connection, host, content-length, transfer-encoding) in order to only iterate over the headers and write them to the socket or cached h.toString() and use for performance
  3. remove more checks (https://github.com/nodejs/undici/blob/45eaed232577e9b66d2f0a596eb018c79e098541/lib/dispatcher/client-h1.js#L1332)
  4. Remove all unnecessary header checks, as we will have a structure that always returns an iterator of headers and a ready-made head row. https://github.com/nodejs/undici/blob/45eaed232577e9b66d2f0a596eb018c79e098541/lib/dispatcher/client-h1.js#L1139

It turns out that all the request data will be prepared in advance and all that remains is to write them to the socket without unnecessary checks.

I want to hear your opinion on this


import { isValidHeaderName, isValidHeaderValue } from './utils.ts';

interface IHTTPHeader {
  name: string;
  values: string[];
};

interface IHTTPHeaders {
  // TODO:
}

const kHeadersMap = Symbol('headers map');
const kCachable = Symbol('cachable');

const cacheWMap: WeakMap<HTTPHeaders, string> = new WeakMap();

export class HTTPHeader implements IHTTPHeader {
  name: string;
  values: string[];

  constructor(name: string, value: string | string[]) {
    this.name = name;
    this.values = typeof value === 'string' ? [value] : value.slice();
  }

  append(value: string | string[]) {
    const { values } = this;
    if (typeof value === 'string') {
      values.push(value);
    } else {
      values.length = (values.length + value.length);
      for (let i = 0; i < value.length; i++) {
        values.push(value[i]);
      }
    }
  }

  clone() {
    return new HTTPHeader(this.name, this.values.slice());
  }

  * entries() {
    const { name: rawName, values } = this;
    const name = rawName.toLowerCase();

    if (values.length === 1) {
      yield [rawName, values[0]];
    }

    else if (name === 'set-cookie') {
      for (let index = 0; index < values.length; index++) {
        yield [rawName, values[index]];
      }
    }
    else {
      const separator = name === 'cookie' ? '; ' : ', ';
      yield [rawName, values.join(separator)];
    }
  }

  *[Symbol.iterator]() {
    return this.entries();
  }

  toString() {
    let raw = ``;
    for (const [name, value] of this.entries()) {
      raw += `${name}: ${value}\r\n`;
    }
    return raw;
  }
}

type HTTPHeadersInit = [string, string][] | Record<string, string> | Iterable<[string, string]> | HTTPHeaders | Headers;

export class HTTPHeaders implements IHTTPHeaders {
  [kCachable] = false;
  [kHeadersMap]: Map<string, IHTTPHeader> = new Map();

  static extend(h: HTTPHeaders, init: HTTPHeadersInit) {
    for (const [name, value] of new HTTPHeaders(init)) {
      h.append(name, value);
    }

    return h;
  }

  constructor(init?: HTTPHeadersInit) {
    if (!init || typeof init !== 'object') return;

    const map = this[kHeadersMap];

    if (init instanceof HTTPHeaders) {
      for (const [name, header] of init[kHeadersMap].entries()) {
        map.set(name, copyHeader(header));
      }
    }

    else if (Symbol.iterator in init) {
      for (const [name, value] of init) {
        this.append(name, value);
      }
    }

    else {
      for (const [name, value] of Object.entries(init)) {
        this.append(name, value);
      }
    }
  }

  clone() {
    return new HTTPHeaders(this);
  }

  extend(init: HTTPHeadersInit = {}) {
    const h = this.clone();
    for (const [name, value] of new HTTPHeaders(init)) {
      h.append(name, value);
    }
    return h;
  }

  assign(...init: HTTPHeadersInit[]) {
    const h = this.clone();

    for (const entry of init) {
      for (const [name, value] of new HTTPHeaders(entry)) {
        h.append(name, value);
      }
    }

    return h;
  }

  set(name: string, value: string, raw: boolean = true) {
    name = name.trim();
    if (!isValidHeaderName(name)) {
      throw new Error(`"${name}" is an invalid header name.`);
    }

    value = value.trim();
    if (!isValidHeaderValue(value)) {
      throw new Error(`"${value}" is an invalid header value.`);
    }

    const lower = name.toLowerCase();
    const map = this[kHeadersMap];

    map.set(lower, new HTTPHeader(raw ? name : lower, value));

    return this[kCachable] ? this.cache() : this;
  }

  append(name: string, value: string, raw: boolean = true) {
    const lower = name.toLowerCase();
    const map = this[kHeadersMap];

    // Проверяем наличие
    if (map.has(lower)) {
      value = value.trim();
      if (!isValidHeaderValue(value)) {
        throw new Error(`"${value}" is an invalid header value.`);
      }

      appendHeaderValues(map.get(lower)!, value);
    } else {
      name = name.trim();
      if (!isValidHeaderName(name)) {
        throw new Error(`"${name}" is an invalid header name.`);
      }

      value = value.trim();
      if (!isValidHeaderValue(value)) {
        throw new Error(`"${value}" is an invalid header value.`);
      }
      map.set(lower, new HTTPHeader(raw ? name : lower, value));
    }

    return this[kCachable] ? this.cache() : this;
  }

  has(name: string) {
    return this[kHeadersMap].has(name.toLowerCase());
  }
  get(name: string) {
    return this[kHeadersMap].get(name.toLowerCase());
  }
  delete(name: string) {
    const deleted = this[kHeadersMap].delete(name.toLowerCase());
    if (this[kCachable]) this.cache();
    return deleted;
  }

  *entries(raw: boolean = true) {
    for (const [name, header] of this[kHeadersMap]) {
      const { name: rawName, values } = header;
      const headerName = raw ? rawName : name;

      if (values.length === 1) {
        yield [headerName, values[0]];
      }

      else if (name === 'set-cookie') {
        for (let index = 0; index < values.length; index++) {
          yield [headerName, values[index]];
        }
      }
      else {
        const separator = name === 'cookie' ? '; ' : ', ';
        yield [headerName, values.join(separator)];
      }
    }
  }

  keys() {
    return this[kHeadersMap].keys();
  }

  values(name?: string) {
    if (!name) {
      return this[kHeadersMap].values().map(h => h.values)!;
    } else {
      return this[kHeadersMap].get(name)?.values ?? [];
    }
  }

  getSetCookie() {
    return (this.values('set-cookie') ?? []) as string[];
  }

  [Symbol.iterator]() {
    return this.entries();
  }

  cache() {
    this[kCachable] = true;
    cacheWMap.set(this, this.build());
    return this;
  }

  build() {
    let raw = ``;
    for (const [name, value] of this.entries()) {
      raw += `${name}: ${value}\r\n`;
    }
    return raw;
  }

  toString() {
    return cacheWMap.has(this) ? cacheWMap.get(this)! : this.build();
  }
}


export function createHeader(name: string, value: string | string[]): IHTTPHeader {
  return new HTTPHeader(name, value);
}

export function appendHeaderValues(h: IHTTPHeader, value: string | string[]) {
  const { values } = h;
  if (typeof value === 'string') {
    values.push(value);
  } else {
    values.length = (values.length + value.length);
    for (let i = 0; i < value.length; i++) {
      values.push(value[i]);
    }
  }
}

export function copyHeader(h: IHTTPHeader) {
  const { name, values } = h;
  return new HTTPHeader(name, values.slice());
}

PandaWorker avatar Jan 11 '25 21:01 PandaWorker

Caching the finished header string greatly reduces the load

import { bench, run, summary } from 'mitata';
import { HTTPHeaders } from './HTTPHeaders.ts';

const arr = Array.from({ length: 20 }, (_, i) => [`x-user-${i}`, `${i}`]);

const h = new HTTPHeaders(arr);
const h2 = h.clone().cache();

summary(() => {
  bench('without cache', () => {
    h.toString();
  });

  bench('with cache', () => {
    h2.toString();
  });

  bench('forof', () => {
    let a = '';
    for (const [name, value] of h) {
      a += `${name}: ${value}\r\n`;
    }
  });

  bench('forof noop', () => {
    for (const [name, value] of h) {}
  });
});

run();
clk: ~2.99 GHz
cpu: Apple M1 Max
runtime: node 22.12.0 (arm64-darwin)

benchmark                   avg (min … max) p75   p99    (min … top 1%)
------------------------------------------- -------------------------------
without cache                  1.05 µs/iter   1.05 µs       ▂▄█▅▃ ▂        
                        (1.02 µs … 1.09 µs)   1.08 µs ▅▂▇▆▄▂█████▇█▅▄▁▂▂▁▁▂
                  5.50 ipc (  2.53% stalls)    NaN% L1 data cache
          3.50k cycles  19.22k instructions   0.00% retired LD/ST (   0.00)

with cache                     8.41 ns/iter   8.35 ns  █                   
                       (8.22 ns … 40.03 ns)   9.94 ns ▁██▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
                  6.11 ipc (  0.00% stalls)    NaN% L1 data cache
          27.91 cycles  170.57 instructions   0.00% retired LD/ST (   0.00)

forof                          1.05 µs/iter   1.05 µs      ▆█              
                        (1.02 µs … 1.14 µs)   1.10 µs ▃▅▂▃▃███▇▇▄▁▁▁▁▁▁▁▁▁▂
                  5.47 ipc (  2.45% stalls)    NaN% L1 data cache
          3.49k cycles  19.09k instructions   0.00% retired LD/ST (   0.00)

forof noop                   719.00 ns/iter 722.48 ns  █▂ ▄▂               
                    (707.91 ns … 767.65 ns) 744.45 ns ▆██▇██▇▆▃▃▄▅▃▃▄▂▃▂▃▂▂
                  4.86 ipc (  3.55% stalls)    NaN% L1 data cache
          2.37k cycles  11.54k instructions   0.00% retired LD/ST (   0.00)

summary
  with cache
   85.5x faster than forof noop
   124.41x faster than without cache
   124.72x faster than forof

PandaWorker avatar Jan 11 '25 21:01 PandaWorker

That's the approach I use in https://github.com/mcollina/autocannon.

mcollina avatar Jan 11 '25 22:01 mcollina

we need a similar approach, only write the request body separately, since there may be large buffer, streams, generators, and so on that do not need to be dumped into memory by default.

PandaWorker avatar Jan 12 '25 13:01 PandaWorker

Like the idea/approach for the immutable headers concept; one thing that grasped my attention was the each header being represented with a class. Do we need such fine grain representation?

Overall, we can target first undici as is and later on attempt to hop fetch somehow (we can provide a layer of portability between ImmutableHeaders and the Headers spec without needing to represent each entry as individual structure), especially as shown that constructed as string provides performance benefits.

metcoder95 avatar Jan 12 '25 14:01 metcoder95

I totally agree with you, we can just add caching to the current headers implementation. And we won't have to change much.

When assembling a request, it would be good for us to bring the headers to the same type (HTTP Headers) and then use the methods of this structure, adding system headers (host, connection, content-length, ...) and so on.

PandaWorker avatar Jan 12 '25 15:01 PandaWorker

Overall, we can target first undici as is and later on attempt to hop fetch somehow (we can provide a layer of portability between ImmutableHeaders and the Headers spec without needing to represent each entry as individual structure), especially as shown that constructed as string provides performance benefits.

The WHATWG-compliant Headers implementation is slow, we can't really use for any internals processing.

mcollina avatar Jan 13 '25 07:01 mcollina