typescript-runtime-type-benchmarks adapt to real world scenarios

Hi,

Nice project, thanks, i've used it for my evaluations.

I've noticed the huge gap between 2 libraries to all other libraries

ts-quartet
ts-json-validator

This huge gap is probably because of the way the project is running the tests. The 2 libraries above use a different strategy than all others to create the validators.

While others mostly use predefined, hard-coded validator functions and through composition of them create a schema, the fastest 2 libraries will compile JS code at runtime (eval() or new Function(...)) to create discrete validation functions that do not call other functions internally (no composition) but instead have all the required validation code within the same function created specifically for the schema.

For example, Quartet:

For the following schema:

const checkData = v<Data>({
  number: v.safeInteger,
  negNumber: v.negative,
  maxNumber: v.positive,
  string: v.string,
  longString: v.string,
  boolean: v.boolean,
  deeplyNested: {
    foo: v.string,
    num: v.number,
    bool: v.boolean,
  },
});

It will generate the following validator function:

function validator(value) {
  if (value == null) return false
  if (!Number.isSafeInteger(value.number)) return false
  if (value.negNumber >= 0) return false
  if (value.maxNumber <= 0) return false
  if (typeof value.string !== 'string') return false
  if (typeof value.longString !== 'string') return false
  if (typeof value.boolean !== 'boolean') return false
  if (value.deeplyNested == null) return false
  if (typeof value.deeplyNested.foo !== 'string') return false
  if (typeof value.deeplyNested.num !== 'number') return false
  if (typeof value.deeplyNested.bool !== 'boolean') return false
  return true
}

This has a deep impact on performance depending on how you run your code.

The benchmark code in this project will use 1 schema and iterate over it for a certain period of time. This is perfect for quartet because of how V8 works. The function becomes super hot, it quickly becomes inlined and additionally, if any internal function call exists within the validator it will get inlined as well!

In other libraries, this can not happen because so many functions are called, due to the composition, so most of them are cold and nothing get's inlined.

In real world scenarios, such a perfect order does not exists. For example, when handling incoming request, so many functions are called that by the time we reach the validator it is no longer hot!
And of course, we also need to factor in handling of multiple incoming requests.

The major advantage of the 2 libraries in question does not play along in real world scenarios and the results of the 2 are distorted in the benchmark.

I should also note the security risks of using runtime code evaluation. For a popular and heavily used library like ts-json-validator (which is actuall ajv) this is less of a concern. For a almost not used, not popular library like quartet I will take caution.

I general sucj a huge gap does not make sense, otherwise everyone would have used these libraries entirely.

Thanks again!

May 21 '20 19:05 shlomiassaf

BTW, you can add a vanilla JS benchmark, as a control group.

import { Case } from './abstract';

export class VanillaJsCase extends Case implements Case {
  name = 'vanilla-js';

  validate() {
    const value = this.data;
    if (value == null) return;
    if (!Number.isSafeInteger(value.number)) return;
    if (value.negNumber >= 0) return;
    if (value.maxNumber <= 0) return;
    if (typeof value.string !== 'string') return;
    if (typeof value.longString !== 'string') return;
    if (typeof value.boolean !== 'boolean') return;
    if (value.deeplyNested == null) return;
    if (typeof value.deeplyNested.foo !== 'string') return;
    if (typeof value.deeplyNested.num !== 'number') return;
    if (typeof value.deeplyNested.bool !== 'boolean') return;
    return value;
  }
}

You will see it runs 8x to 10x faster than quartet (the vanilla code is taken from quartet's runtime generated code)

May 21 '20 19:05 shlomiassaf

Hey @shlomiassaf,

This is a great analysis. Thank you for that insight.

I knew that some of these libraries use eval'ed tricks for validation, but didn't realize about hot caching the results. It does make sense.

The major advantage of the 2 libraries in question does not play along in real world scenarios and the results of the 2 are distorted in the benchmark.

Do you have any suggestions on how to fix this?

Randomize the data maybe?

BTW, you can add a vanilla JS benchmark, as a control group.

This unfortunately does not provide the type guarding.

But I think it is possible to create a type-guarded vanilla JS validation function anyways.

There is the new TS assert guard functionality that can be useful here. I'll open another issue for this.

May 22 '20 08:05 moltar

Hey @shlomiassaf,

Any ideas on the above?

Jun 12 '20 03:06 moltar

It would also be helpful to separate quick validations (that return true/false) and error-reporting validations. In the case of io-ts, the .is() method works differently than .decode() and it's faster because it doesn't return a detailed error message. Same for ts-quartet and its e and v exports.

Aug 06 '20 19:08 gigobyte

@gigobyte thank you for your input, I do think this is something we can address also.

Aug 07 '20 07:08 moltar

@gigobyte is right. This is the biggest performance difference. Having such simple checks without error reporting is basically useless. What should I do when quartet returns false? Throw a generic error? Not very practical in real world code.

Also interesting fact regarding quartet is that as soon as you activate error reporting (by using e instead of v import) it's literally over one thousand times slower (1116x to be precise) in my tests. So as soon as you want something serious from it, it breaks apart.

@shlomiassaf a couple of things need to be clarified since they are just not true.

The function becomes super hot, it quickly becomes inlined and additionally, if any internal function call exists within the validator it will get inlined as well!

It didn't get completely inlined into the validate function. You proved that with your control group. If it would have been inlined it would be nearly as fast as the control group. 10x slower means it wasn't inlined.

In other libraries, this can not happen because so many functions are called, due to the composition, so most of them are cold and nothing get's inlined.

Other libraries functions get inlined as well. Not completely but parts are surely inlined. But inlined or not is not the important bit why quartet & co is so much faster. It's simply because much much less code runs per validation. Less code means faster execution times. It doesn't matter if it was inlined or not, at least in this case. There are many factors why code runs fast, and inlining is just one of them. Other important stuff is monomorphic function calls, fast object properties, fast type unboxing, fast built-in functions, etc. When the heuristic determines it can be optimized in certain ways, then they will be optimized.

In real world scenarios, such a perfect order does not exists..

Again, it has nothing to do with order or being inlined. Once a function has been inlined, that won't change. And it doesn't matter at what depth the function was called. When v8 decided the function can be inlined, it will be inlined and stays inlined.

For example, when handling incoming request, so many functions are called that by the time we reach the validator it is no longer hot!

The heuristic to determine if a function is hot and thus could possibly be optimized doesn't work that way. The call stack doesn't matter either. The v8 engine tries to predict how useful it would be to optimize a function by estimating the executing costs of the unoptimized version. Every function might be a candidate for it, even functions that were called in a request/response framework and thus have a bit of delay between each call, or functions that were called deep in the call stack.

And of course, we also need to factor in handling of multiple incoming requests.

That won't change anything. Quartet stays the fastest, no matter how many requests and call stacks you generate in between. As soon as you execute a couple of times this function, v8 tries to optimize it. It doesn't matter if there were 1ms between the call or several seconds, so stretching it artificially won't change anything.

I general sucj a huge gap does not make sense, otherwise everyone would have used these libraries entirely.

It makes totally sense because what quartet & co do is they generate code for the v8 engine that can be perfectly further optimized by the JIT engine. A JIT engine in the JIT engine. This is incredible fast and stays faster, no matter how artificially you want to limit its function calls. The drawback is of course that the code behind it is much more complicated and you need a lot more knowledge to build code that can be perfectly optimized by the v8 JIT engine and won't be deoptimized.

Aug 22 '20 01:08 marcj

It would also be helpful to separate quick validations (that return true/false) and error-reporting validations.

In the case of io-ts, the .is() method works differently than .decode() and it's faster because it doesn't return a detailed error message.

@gigobyte Cannot get this to work. Does it require the extra fp-ts package with Either type?

Same for ts-quartet and its e and v exports.

I just removed quartet, because I didn't realize it, but it did require a type generic to be passed, which goes against the spirit of this project.

@marcj is there anything actionable I can do to improve this project?

Aug 25 '20 09:08 moltar

Cannot get this to work. Does it require the extra fp-ts package with Either type?

decode returns an Either, so you need fp-ts if you want to work with it (e.g. check if it's successful)

Aug 25 '20 18:08 gigobyte

I have a io-ts benchmark here: https://github.com/super-hornet/super-hornet.ts/blob/master/packages/marshal-benchmark/tests/validation2.spec.ts, which is based on their official benchmark. They have already built-in Guard semantics. And Either returned by decode theoretically has detailed error information available.

Start benchmark Marshal vs io-ts
 🏎 x 27,625,397.5  ops/sec ±3.64% 0.0000000361985742 sec/op 	▇▇▇▇▇▇▇▆▆▇▆▆▇▇▆▇▆▆▆▇▇ marshal guard
 🏎 x  8,675,914.33 ops/sec ±2.06% 0.0000001152616269 sec/op 	▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ marshal decode
 🏎 x    627,803.3  ops/sec ±1.49% 0.0000015928556051 sec/op 	▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ io-ts guard
 🏎 x    359,790.77 ops/sec ±1.40% 0.0000027793931574 sec/op 	▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ io-ts decode

Aug 25 '20 20:08 marcj

Made this change: https://github.com/moltar/typescript-runtime-type-benchmarks/commit/5501aa1ca92aa58652a00b5193ace172bcce69d3

Is this good enough?

Aug 26 '20 05:08 moltar

To everyone involved in this issue, @hoeck put a huge amount of effort into this. Please take a look at the results published. And please give feedback. If we can consider this done, then I'll close the issue. Thanks!

Mar 02 '22 10:03 moltar

@marcj any feedback on the recent changes?

Mar 22 '22 02:03 moltar