deno
deno copied to clipboard
deno compile produces slower ARM binaries than x86 binaries on Apple M1
$ deno --version
deno 1.22.0 (release, x86_64-apple-darwin)
v8 10.0.139.17
typescript 4.6.2
Installed via Homebrew.
On an M1, the binary produced with --target aarch64-apple-darwin
is consistently slower than the one produced by --target x86_64-apple-darwin
.
On first run, --target x86_64-apple-darwin
took 2.726
seconds as measured by the time
command. On subsequent runs it took 0.097
seconds.
I missed the first run of --target aarch64-apple-darwin
, but subsequent commands run consistently at about 0.31
seconds. This is more than 3x slower.
This is very surprising as one would expect the x86 binary to consistently be slower than the ARM binary.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Request: could issues be closed as wontfix
instead of being marked stale
?
Is this still an issue? I cannot reproduce this on M1:
time deno compile a.js --target x86_64-apple-darwin
Compile file:///Users/divy/a.js
Emit a
________________________________________________________
Executed in 134.79 millis fish external
usr time 27.88 millis 8.27 millis 19.61 millis
sys time 65.96 millis 1.83 millis 64.13 millis
time deno compile a.js --target aarch64-apple-darwin
Compile file:///Users/divy/a.js
Emit a
________________________________________________________
Executed in 125.10 millis fish external
usr time 29.78 millis 9.53 millis 20.25 millis
sys time 67.12 millis 1.95 millis 65.18 millis
@littledivy not the amount of time it takes to compile a file, but the amount of time it takes to run the resulting executable.
Ok, its impossible to tell without looking at the code being compiled. Just a console.log
doesn't seem to reproduce it:
# x86_64
Hello
________________________________________________________
Executed in 50.00 millis fish external
usr time 32.47 millis 0.24 millis 32.23 millis
sys time 29.68 millis 1.99 millis 27.69 millis
# arm64
Hello
________________________________________________________
Executed in 38.38 millis fish external
usr time 40.74 millis 0.29 millis 40.45 millis
sys time 19.14 millis 2.22 millis 16.92 millis
I downloaded latest version of Deno, 1.25.3, and was able to reproduce the issue.
If you'd like to reproduce it yourself, here are the steps:
- Obtain an M1 or M2 computer.
- Clone https://github.com/okTurtles/chel
- Run:
deno task build && deno task compile
- Unzip the two
*-apple-darwin.tar.gz
files indist/
- Run
time
on the two binaries. Note that as stated in the issue, the first time you runtime
the x86 version will be slower, but on subsequent runs it will remain faster.
$ time ./dist/aarch64-apple-darwin/chel version
1.1.2
./dist/aarch64-apple-darwin/chel version 0.03s user 0.02s system 7% cpu 0.695 total
$ time ./dist/x86_64-apple-darwin/chel version
1.1.2
./dist/x86_64-apple-darwin/chel version 0.03s user 0.02s system 1% cpu 2.771 total
$ time ./dist/aarch64-apple-darwin/chel version
1.1.2
./dist/aarch64-apple-darwin/chel version 0.03s user 0.01s system 23% cpu 0.178 total
$ time ./dist/x86_64-apple-darwin/chel version
1.1.2
./dist/x86_64-apple-darwin/chel version 0.03s user 0.01s system 83% cpu 0.054 total
$ time ./dist/aarch64-apple-darwin/chel version
1.1.2
./dist/aarch64-apple-darwin/chel version 0.03s user 0.01s system 22% cpu 0.180 total
$ time ./dist/x86_64-apple-darwin/chel version
1.1.2
./dist/x86_64-apple-darwin/chel version 0.04s user 0.01s system 77% cpu 0.063 total
The results for a more complicated subcommand of chel
too btw.
Ok but I cannot reproduce the claims here: This is more than 3x slower.
. The slow first run is expected because rosetta doing its thing / cold start.
Also, since we are measuring in <100ms, time
can be very noisy dependeing on your system. Here are the results using hyperfine
:
The slow first run is expected because rosetta doing its thing / cold start.
I mentioned this above and in the original post, yes Rosetta will cause the x86 binary to be slower on first run. But on subsequent runs the x86 binary is faster than the arm binary, when it should be the reverse.
Ok but I cannot reproduce the claims here: This is more than 3x slower..
~~I just tried with hyperfine
the exact same command, and it shows the x86 binary being over 15 times faster than the arm binary!~~
EDIT: After fixing my zsh shell (it was an x86 binary - now it's arm64), I rebuilt everything and re-ran the benchmark.
Now the arm binary is only 2.36x slower than the x86 binary. Still slower though. Updated benchmarks below:
$ hyperfine --warmup 2 './dist/x86_64-apple-darwin/chel version' './dist/aarch64-apple-darwin/chel version'
Benchmark 1: ./dist/x86_64-apple-darwin/chel version
Time (mean ± σ): 53.2 ms ± 1.5 ms [User: 30.7 ms, System: 6.0 ms]
Range (min … max): 50.9 ms … 56.6 ms 44 runs
Benchmark 2: ./dist/aarch64-apple-darwin/chel version
Time (mean ± σ): 125.7 ms ± 14.0 ms [User: 28.9 ms, System: 3.9 ms]
Range (min … max): 108.6 ms … 150.6 ms 20 runs
Summary
'./dist/x86_64-apple-darwin/chel version' ran
2.36 ± 0.27 times faster than './dist/aarch64-apple-darwin/chel version'
Updated my comment above with:
EDIT: After fixing my zsh shell (it was an x86 binary - now it's arm64), I rebuilt everything and re-ran the benchmark.
Now the arm binary is only 2.36x slower than the x86 binary. Still slower though. Updated benchmarks below:
Update: I tried again using deno 1.32.4 to see if anything had changed regarding this, and here are the results:
-> % hyperfine --warmup 2 './dist/x86_64-apple-darwin/chel version' './dist/aarch64-apple-darwin/chel version'
Benchmark 1: ./dist/x86_64-apple-darwin/chel version
Time (mean ± σ): 50.0 ms ± 4.8 ms [User: 32.2 ms, System: 6.8 ms]
Range (min … max): 42.5 ms … 69.8 ms 45 runs
Benchmark 2: ./dist/aarch64-apple-darwin/chel version
Time (mean ± σ): 117.6 ms ± 10.2 ms [User: 33.4 ms, System: 3.4 ms]
Range (min … max): 101.7 ms … 144.1 ms 26 runs
Summary
'./dist/x86_64-apple-darwin/chel version' ran
2.35 ± 0.30 times faster than './dist/aarch64-apple-darwin/chel version'
-> % lipo -info dist/aarch64-apple-darwin/chel
Non-fat file: dist/aarch64-apple-darwin/chel is architecture: arm64
-> % lipo -info dist/x86_64-apple-darwin/chel
Non-fat file: dist/x86_64-apple-darwin/chel is architecture: x86_64
You can see that I'm not confusing the binaries, as lipo
outputs that the aarch64-apple-darwin
binary is indeed arm64
.
Here's how we generate these binaries:
#!/usr/bin/env -S deno run --allow-run --allow-read=. --allow-write=./dist
import { sh } from '../src/deps.ts'
function $ (command: string) {
return sh(command, { printOutput: true })
}
const { default: { version } } = await import('../package.json', { assert: { type: "json" } })
export async function compile () {
// NOTE: Apple ARM is slower than x86 on M1!
// https://github.com/denoland/deno/issues/14935
const archs = ['x86_64-unknown-linux-gnu', 'x86_64-pc-windows-msvc', 'x86_64-apple-darwin', 'aarch64-apple-darwin']
for (const arch of archs) {
const dir = `./dist/tmp/${arch}`
const bin = arch.includes('windows') ? 'chel.exe' : 'chel'
// note: could also use https://examples.deno.land/temporary-files
await $(`mkdir -vp ${dir}`)
await $(`deno compile --allow-read=./ --allow-write=./ --allow-net --no-remote --import-map=vendor/import_map.json -o ${dir}/${bin} --target ${arch} ./build/main.js`)
await $(`tar -C ./dist/tmp -czvf ./dist/chel-v${version}-${arch}.tar.gz ${arch}`)
}
await $(`sha256sum dist/chel-v${version}-*`)
// TODO: sign the sha256sum! pipe this to gpg and include a link to your GPG key in the release notes!
}
try {
await compile()
} catch (e) {
console.error('caught:', e.message)
} finally {
await sh(`rm -rf ./dist/tmp`)
}
The relevant line is:
await $(`deno compile --allow-read=./ --allow-write=./ --allow-net --no-remote --import-map=vendor/import_map.json -o ${dir}/${bin} --target ${arch} ./build/main.js`)
I tried again just now with Deno 1.39.2 and all of a sudden hyperfine
results are looking correct:
hyperfine --warmup 2 './dist/x86_64-apple-darwin/chel version' './dist/aarch64-apple-darwin/chel version'
Benchmark 1: ./dist/x86_64-apple-darwin/chel version
Time (mean ± σ): 93.1 ms ± 2.3 ms [User: 75.2 ms, System: 22.6 ms]
Range (min … max): 87.6 ms … 97.8 ms 31 runs
Benchmark 2: ./dist/aarch64-apple-darwin/chel version
Time (mean ± σ): 53.2 ms ± 1.4 ms [User: 46.7 ms, System: 12.0 ms]
Range (min … max): 49.9 ms … 56.3 ms 54 runs
Summary
./dist/aarch64-apple-darwin/chel version ran
1.75 ± 0.06 times faster than ./dist/x86_64-apple-darwin/chel version
So, closing 🤷♂️
Glad it seems to be working!