qdd
qdd copied to clipboard
Not as fast on Macs
As first reported by @zkat, and verified by others, it looks like the final number on the benchmark, referring to the primed-cache speed of qdd
, seems to be not significantly better on Macs. While I'm getting times of 0.5s on my machine (Linux), Mac users seem to be getting closer to 4s or 5s.
I collected .cpuprofile
data from my machine and a Mac, and found that around 80% of the time on Macs is being spend in (idle)
, leading me to believe it's simply waiting on filesystem operations the whole time. On Linux the (idle)
time is closer to 20%-25%, so while this might not account for all of the overhead, it at least accounts for a really huge chunk of it.
Since almost all of that operation consists of a recursive copy, I modified that file to use standard fs
module operations (the qdd
version calls the binding directly), and also added perf_hooks
marks to see which of the filesystem calls were taking up the most time. The resulting script is in this gist, which can be run from any arbitrary empty directory. The test downloads a tarball, unpacks it, and measures the time to copy recursively from one directory to another. In qdd
these operations happen many, many times in parallel.
Here are the results (time in ms):
My Arch Linux Lenovo X1 Carbon (from 2016):
$ node copytest.js
readdir 1.995347
stat 16.328005827783088
mkdir 7.106336499999999
copyfile 15.482173730219255
---
77.179121
A Google Compute Cloud instance (TODO put specs in here):
$ node copytest.js
readdir 1.7026445
stat 14.900154024738319
mkdir 9.860568500000001
copyfile 15.350753629170656
---
71.24901
A macincloud.com Pay-as-you-Go instance (OS X High Sierra):
$ node copytest.js
readdir 4.9309635
stat 83.9008870846811
mkdir 38.07782
copyfile 78.84369494852221
---
353.116088
While this is still pretty inconclusive, fs.stat
and fs.copyFile
seem to be taking considerably longer on a Mac than on Linux. In all tests, [email protected] is used. For both my machine and the Google instance, the filesystem is ext4
and for the Mac it's HFS+
.
Maybe compare ulimit -n
? I think it is 256 on macOS by default... not sure what benchmark you are using, but that is pretty low if you are opening a bunch of files.
Does the perf difference carry over to sync calls as well?
@evanlucas For ulimit -n
I'm seeing 4096 on my Linux machine, 32768 on macOS.
@addaleax It looks like it does. I added a sync version of the test to the gist. Here are the results:
Linux:
$ node copytestsync.js
readdir 0.44465699999999997
stat 0.01158147002854424
mkdir 0.056488000000000003
copyfile 0.02846236224976165
---
75.75296
Mac:
$ node copytestsync.js
readdir 0.9944815
stat 0.050528007611798355
mkdir 0.222148
copyfile 0.48516409151572953
---
673.984851
Since the test code here is effectively doing the same thing as a cp -r
, I thought I'd try timing that in both environments. The script used is in the gist as copytest.sh
.
Linux:
real 0m0.022s
user 0m0.003s
sys 0m0.019s
Mac:
real 0m0.318s
user 0m0.026s
sys 0m0.281s
I think that rules out overhead from the threadpool mechanisms (which is ultimately implemented in a very platform-dependent way). (Btw, file system writes are protected by a global lock on OS X, so they can’t use the threadpool effectively there – but that seems unlikely, too, if it also affects sync code and other functions.)
If you want to hear my best guess, it’s probably an actual perf difference in the OS or the file system. I guess trying to reproduce this with C code using the raw syscalls could prove or disprove that?
@bengl, if you spend the entire night debugging syscalls, which I suspect you will, you should take notes and turn it into a talk ;)
Is filevault enabled?
@LarsJK AFAIK yes, but note also that my Linux system is using LUKS, which I'd imagine is pretty similar in terms of overhead.