node icon indicating copy to clipboard operation
node copied to clipboard

Relative performance of CJS vs ESM

Open achingbrain opened this issue 1 year ago • 1 comments

Version

v14.20.0, v16.16.0, v18.7.0

Platform

Darwin MacBook-Pro-5.localdomain 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 arm64

Subsystem

No response

What steps will reproduce the bug?

See repro repo at https://github.com/achingbrain/esm-vs-cjs

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior?

ESM and CJS classes should have similar performance characteristics instead of ESM being 10x slower than CJS.

What do you see instead?

ESM classes are 10x slower than CJS classes.

Additional information

I was porting some CJS code to ESM and benchmarking it to ensure I hadn't accidentally removed any performance optimisations but the ESM code was always significantly slower. I started removing functionality to narrow down where the bottleneck was but I still couldn't find it.

Eventually I ended up with a benchmark suite that did pretty much nothing, yet the CJS version of the same code was still massively faster than the ESM version, so I created a fresh benchmark suite that just instantiated a class and called a simple method and 😮 it turns out ESM performance is quite poor compared to CJS, and CJS has taken a big dip in node 18.

Benchmark data:

Node 14

% node index.js
esm x 106,756,363 ops/sec ±0.15% (93 runs sampled)
cjs x 993,925,878 ops/sec ±0.17% (93 runs sampled)
Fastest is cjs

Node 16

% node index.js
esm x 156,061,934 ops/sec ±0.19% (94 runs sampled)
cjs x 1,034,861,972 ops/sec ±0.19% (97 runs sampled)
Fastest is cjs

Node 18

% node index.js
esm x 144,767,462 ops/sec ±0.35% (93 runs sampled)
cjs x 388,040,620 ops/sec ±0.25% (100 runs sampled)
Fastest is cjs

achingbrain avatar Aug 09 '22 07:08 achingbrain

I get the same results with node.js v18, M1:

esm x 144,790,240 ops/sec ±0.18% (96 runs sampled)
cjs x 388,853,065 ops/sec ±0.14% (100 runs sampled)

what's interesting is that you can make the cjs imported class instantiation as "slow" as esm by using a named import: import { CJSClass } from './cjs.cjs'

esm x 146,625,769 ops/sec ±0.21% (99 runs sampled)
cjs x 149,342,545 ops/sec ±0.28% (101 runs sampled)

or make the esm imported class instantiation (almost) as "fast" as cjs by assigning the class to an intermediate variable: const EsmClass = ESMClass ... const obj = new EsmClass()

esm x 371,873,528 ops/sec ±0.24% (95 runs sampled)
cjs x 390,400,627 ops/sec ±0.43% (100 runs sampled)

dnalborczyk avatar Aug 09 '22 18:08 dnalborczyk

Interesting, and wild. Switching both to use default exports slows CJS down to ESM speeds as well:

// esm.js
export default class ESMClass

// cjs.cjs
module.exports = CJSClass
% node index.js
esm x 154,451,460 ops/sec ±0.12% (97 runs sampled)
cjs x 155,192,490 ops/sec ±0.12% (97 runs sampled)
Fastest is cjs

achingbrain avatar Aug 10 '22 06:08 achingbrain

Maybe @nodejs/v8 can have a look?

targos avatar Aug 10 '22 06:08 targos

what's interesting is that you can make the cjs imported class instantiation as "slow" as esm by using a named import: import { CJSClass } from './cjs.cjs'

Named import is a live binding. I understand that it will be slower than const CJSClass = defaultImport.CJSClass, which does not update CJSClass should ./cjs.cjs's default export be changed. But I would expect V8 optimizes them if the default export are "stable" and then fallbacks to the slow path for situations like cyclic dependencies.

JLHwung avatar Aug 10 '22 14:08 JLHwung

@JLHwung I expected that this might have something to do with live bindings. I wonder if v8 is even aware of any modules, since it's my impression that modules (esm) were implemented in the host (node.js) directly, and not in v8 (other than the parsing)?

I imagine live bindings also apply to namespace imports?

import * as foo from './esm.js' ... const obj = new foo.ESMClass()

esm x 372,740,601 ops/sec ±0.33% (98 runs sampled)
cjs x 393,491,227 ops/sec ±0.08% (100 runs sampled)

dnalborczyk avatar Aug 10 '22 16:08 dnalborczyk

cc @GeoffreyBooth @guybedford I would think this is expected given how ESM works, but it’d be good to a sanity check.

mcollina avatar Aug 10 '22 19:08 mcollina

@achingbrain thanks for the report. I also noticed the difference in the import statements that others have pointed out. Also the CJS version does module.exports = { CJSClass } rather than module.exports = CJSClass, which might make a difference.

Rather than the two benchmarks, what if you expanded to four?

import cjs from './cjs.cjs'
import { CJSClass } from './cjs.cjs'
import esm from './esm.js'
import { ESMClass } from './esm.js'

And see how those compare.

I think the esm.js and cjs.cjs files are equivalent, with export class ESMClass and module.exports = { CJSClass }, but you could also try testing what happens when you change those to export default class ESMClass and module.exports = CJSClass.

GeoffreyBooth avatar Aug 10 '22 19:08 GeoffreyBooth

And see how those compare.

@GeoffreyBooth I think that's what we did above, although it might be not easy to see.

curious, do live bindings apply to imported cjs modules? I would think they do, just never tried.

dnalborczyk avatar Aug 10 '22 20:08 dnalborczyk

About the regression in v18: a CPU profile shows that the instance spends around 50% of the time on garbage collection (and indeed running the snippet with --trace-gc-verbose logs much more scavenge collections than v16)

joyeecheung avatar Aug 11 '22 05:08 joyeecheung

@achingbrain thanks for your work in looking into this, these are really important numbers to track. If we had had benchmarks to compare, we may have been able to catch the 18 performance regression sooner.

Would you be interested in submitting your case as a core benchmark to the project? I think it would be really useful to maintain these benchmarks going forward to keep track of progress. As they say, make it work, make it right, make it fast. If we're at the make it fast stage, this is exactly the time to put eyes to these metrics and keep eyes to these metrics as we continue to work on the module system.

guybedford avatar Aug 11 '22 07:08 guybedford

I am not seeing the performance difference. I cloned down the repo from @achingbrain and here are my numbers using on Windows 10 on an old XEON E5-1660 with 64gb ram.

18.6.0

repo default using classes

esm x 140,014,888 ops/sec ±1.57% (87 runs sampled)
cjs x 141,244,394 ops/sec ±0.81% (87 runs sampled)
Fastest is cjs,esm

esm x 137,750,317 ops/sec ±1.86% (83 runs sampled)
cjs x 140,985,369 ops/sec ±0.86% (87 runs sampled)
Fastest is cjs,esm

esm x 138,199,158 ops/sec ±1.90% (87 runs sampled)
cjs x 140,614,316 ops/sec ±0.97% (86 runs sampled)
Fastest is cjs,esm

functions with export default in ESM:

esm x 183,906,165 ops/sec ±1.00% (85 runs sampled)
cjs x 184,403,862 ops/sec ±0.78% (86 runs sampled)
Fastest is cjs,esm

esm x 186,102,244 ops/sec ±1.07% (89 runs sampled)
cjs x 185,937,839 ops/sec ±0.89% (86 runs sampled)
Fastest is cjs,esm

esm x 185,829,978 ops/sec ±1.03% (89 runs sampled)
cjs x 182,808,262 ops/sec ±1.31% (85 runs sampled)
Fastest is esm,cjs

Functions without export default in ESM:

esm x 184,485,626 ops/sec ±1.09% (90 runs sampled)
cjs x 183,400,204 ops/sec ±1.29% (86 runs sampled)
Fastest is esm,cjs

esm x 184,691,856 ops/sec ±1.14% (89 runs sampled)
cjs x 178,147,329 ops/sec ±1.60% (83 runs sampled)
Fastest is esm

esm x 185,284,022 ops/sec ±0.81% (88 runs sampled)
cjs x 184,844,828 ops/sec ±1.12% (86 runs sampled)
Fastest is esm,cjs

14.16.1

repo default using classes

esm x 178,850,249 ops/sec ±1.46% (84 runs sampled)
cjs x 793,577,821 ops/sec ±0.78% (89 runs sampled)
Fastest is cjs

esm x 181,141,044 ops/sec ±1.08% (85 runs sampled)
cjs x 797,327,349 ops/sec ±0.66% (89 runs sampled)
Fastest is cjs

esm x 182,353,684 ops/sec ±0.85% (85 runs sampled)
cjs x 796,942,125 ops/sec ±0.70% (88 runs sampled)
Fastest is cjs

functions with export default in ESM:

esm x 181,514,951 ops/sec ±0.96% (86 runs sampled)
cjs x 180,441,385 ops/sec ±1.41% (86 runs sampled)
Fastest is esm,cjs

esm x 180,823,619 ops/sec ±1.23% (85 runs sampled)
cjs x 184,534,397 ops/sec ±0.74% (88 runs sampled)
Fastest is cjs

esm x 181,899,153 ops/sec ±1.42% (85 runs sampled)
cjs x 183,459,404 ops/sec ±0.95% (87 runs sampled)
Fastest is cjs,esm

Functions without export default in ESM:

esm x 183,537,571 ops/sec ±0.77% (87 runs sampled)
cjs x 182,491,722 ops/sec ±0.96% (86 runs sampled)
Fastest is esm,cjs

esm x 181,996,932 ops/sec ±1.12% (86 runs sampled)
cjs x 181,842,024 ops/sec ±0.90% (86 runs sampled)
Fastest is cjs,esm

esm x 181,305,242 ops/sec ±1.11% (84 runs sampled)
cjs x 181,452,168 ops/sec ±1.22% (85 runs sampled)
Fastest is esm,cjs

Summary

  • There was a performance difference in earlier versions of Node, but it was limited to classes only.
  • The performance difference is less pronounced in Windows.
  • These are simple tests and do not account for other scenarios such as where a function may have methods of its own, as opposed to being executed by name only.

prettydiff avatar Aug 19 '22 14:08 prettydiff

Should we keep this issue open? If not, what concrete actions can we take before closing it?

targos avatar Nov 08 '22 14:11 targos

I think this can be closed. Maybe we can document it but there is absolutely nothing we can do about it.

mcollina avatar Nov 08 '22 14:11 mcollina

Would like to add more signal here. Have been tinkering with io-ts. Same io-ts code is about 20% percent slower in ESM mode. Synthetic benchmark that calls simple x => x function that returns a passed value, shows the same picture. IMO, the issue is closed prematurely.

ukstv avatar Nov 11 '22 00:11 ukstv

I've updated the repro repo with different styles of importing - named imports, default imports and namespace imports. Also using the classes directly and also via constant bindings.

Full results are in the README there but the TLDR is that using constant bindings of your classes or accessing them as a property of a namespace import is significantly faster than not. ESM/CJS doesn't appear to actually make a difference after all. Also node 14 is almost 3x faster than node 18.

E.g. don't do:

import { MyClass } from 'some-module'

new MyClass()

or

import MyClass from 'some-module'

new MyClass()

Instead do:

import * as SomeModule from 'some-module'

const MyClass = SomeModule.MyClass

new MyClass()

or

import * as SomeModule from 'some-module'

new SomeModule.MyClass()

The weird thing is the class references seem to be constant variables already - if I try to overwrite an imported class I get an error:

import MyClass from 'some-module'

MyClass = () => {}

// MyClass = () => {}
//         ^
// TypeError: Assignment to constant variable.

Is there some syntax sugar that's causing the slowness?

achingbrain avatar Nov 23 '22 17:11 achingbrain

This is likely just anecdotal, but I have also noticed a similar performance change on Chrome in DOM traversal speed.

https://jsbench.github.io/#b39045cacae8d8c4a3ec044e538533dc

Just a couple years ago Chrome would reach a performance ceiling of around 45mops in the fastest cases of those tests. Now, its about half that on the same machine.

(Completely irrelevant aside following)

During the same time period DOM performance remains unchanged in Firefox. On this same hardware Firefox performs DOM traversal on those same tests at fastest around 880mpos or about 35x faster. In Chrome DOM performance is CPU bound as discovered by comparing test results between different machine hardware. On Firefox DOM performance is memory bound. This computer uses old DDR3 memory, and my laptop with a weaker CPU but faster DDR4 memory achieves fastest numbers around 1.2bops with some other users reporting 5-6bops on latest hardware.

I have not determined if that speed in Firefox applies to the DOM specifically or graphed tree models generally. If the later is true Node would benefit from shifting data navigation instructions to a stored cache in the JIT VM and thus making it a memory problem instead of a CPU problem.

Likewise, Node would benefit in the inverse from shifting rapid data modification from a memory problem to a CPU problem as I discovered with web socket handling. I discovered that by performance testing my execution of web socket handling on different machines it becomes a memory problem. On all machines I tested execution of message processing is fast for about the first 450,000 messages in a few seconds after which it drops to a dreadfully slow speed of about 300 messages per 5 seconds as Node waits for garbage collection.

https://github.com/prettydiff/share-file-systems/blob/master/documentation/websockets.md#challenges

prettydiff avatar Nov 30 '22 19:11 prettydiff

The weird thing is the class references seem to be constant variables already - if I try to overwrite an imported class I get an error:

import MyClass from 'some-module'

MyClass = () => {}

// MyClass = () => {}
//         ^
// TypeError: Assignment to constant variable.

Is there some syntax sugar that's causing the slowness?

@achingbrain imported bindings can't be reassigned to, but this is valid and the changed value will be reflected in all modules importing it:

export let foo = 42;

setTimeout(() => foo = 13, 1e4);

0kku avatar Jan 19 '23 23:01 0kku

With Bun, loading Babel with CommonJS is roughly 2.4x faster than with ES modules. https://bun.sh/blog/commonjs-is-not-going-away#the-case-for-commonjs

yisibl avatar Aug 24 '23 06:08 yisibl

Yeah, because CommonJS loads modules synchronously (not to mention Babel was written with CJS in mind). Enjoy your entire core grinding to halt as it waits for all dependencies getting resolved.

GabenGar avatar Aug 24 '23 12:08 GabenGar