zod icon indicating copy to clipboard operation
zod copied to clipboard

Huge memory and performance gap between `parse` and `parseAsync`

Open warflash opened this issue 2 months ago • 4 comments

Running parseAsync instead of parse seems to have massive differences in resource consumption for larger objects.

I'd expect some overhead of async operations for sure, however in our production env. and in the benchmark as well the difference is 100x in time consumption and 10x in memory consumption. Larger objects even OOM the entire node app with parseAsync whereas parse hovers at around 300mb.

The reproduction of a minimal bench setup can be found here: https://github.com/warflash/zod-async-perf

Tested on 2 machines with node 18 and 20.

Logs:

Benchmarking with 1000 elements
Validation time sync: 17.801ms
Validation time async: 36.208ms

Benchmarking with 10000 elements
Validation time sync: 63.467ms
Validation time async: 216.365ms

Benchmarking with 50000 elements
Validation time sync: 305.788ms
Validation time async: 1.778s

Benchmarking with 100000 elements
Validation time sync: 665.208ms
Validation time async: 4.917s

Benchmarking with 150000 elements
Validation time sync: 1.164s
Validation time async: 8.275s

Benchmarking with 200000 elements
Validation time sync: 1.273s
Validation time async: 22.556s

Benchmarking with 250000 elements
Validation time sync: 1.746s
Validation time async: 1:51.280 (m:ss.mmm)

warflash avatar Apr 26 '24 13:04 warflash

Thanks for the effort you put into this benchmark! I'm broadly aware of this and am working on solutions in Zod 4. There are some very obvious performance in Zod's current parsing pipeline that unfortunately will require breaking changes to fix.

I'll report back with updated performance numbers against this benchmark once Zod 4 is nearing beta.

colinhacks avatar Apr 26 '24 19:04 colinhacks

Great, that sounds very nice! Appreciate the reply and really looking forward to trying out the v4 beta then to see how it performs against our production ingests 😊

warflash avatar Apr 26 '24 20:04 warflash

Wonder if it would work to get rid of ctx.async and just handle all results from _parse() as potentially async (i.e. check whether the return value is a Promise. Currently there seems to be a lot of potentially unnecessary Promise wrapping just based on ctx.async

jussisaurio avatar May 12 '24 13:05 jussisaurio

^I made a POC of not using ctx.async and instead detecting Promises during parsing. Here's the branch https://github.com/jussisaurio/zod/pull/1/files which currently is based on my other zod perf PR (https://github.com/colinhacks/zod/pull/3487)

Results using your benchmark @warflash :

[email protected]:

Benchmarking with 1000 elements
Validation time sync: 25.703ms
Validation time async: 61.169ms

Benchmarking with 10000 elements
Validation time sync: 110.733ms
Validation time async: 641.032ms

Benchmarking with 50000 elements
Validation time sync: 592.054ms
Validation time async: 4.608s

Benchmarking with 100000 elements
Validation time sync: 1.541s
Validation time async: 13.151s

branch:

Benchmarking with 1000 elements
Validation time sync: 20.666ms
Validation time async: 11.56ms

Benchmarking with 10000 elements
Validation time sync: 113.841ms
Validation time async: 100.013ms

Benchmarking with 50000 elements
Validation time sync: 510.81ms
Validation time async: 512.89ms

Benchmarking with 100000 elements
Validation time sync: 1.056s
Validation time async: 1.188s

The reason sync/async are almost the same in the branch is that it doesn't extra wrap any promises. A more comprehensive benchmark would include stuff like async refines to see what the impact is

jussisaurio avatar May 12 '24 15:05 jussisaurio