[Feature] Reflink NM linker
- [x] I'd be willing to implement this feature (contributing guide)
- [ ] This feature is important to have in this repository; a contrib plugin wouldn't do
Describe the user story
Slow node_modules linker.
Describe the solution you'd like
So I've found orogene package mananger, which uses interesting technic. https://github.com/orogene/orogene/blob/2dc8d9e9d32b9dcc8e8a33e8a729c2c08772c33f/crates/nassun/src/tarball.rs#L443
So when unarchiving tar, it stores files first to cache dir. It's treaded as immutable. Then it's "cloned" to real node_modules dir to project via "reflinks", which works on COW file systems (including APFS) TLDR idea is it's creating a new reference to existing blocks instead of doing data write.
I've run some experimentes in macos, m1 mac.
5 gb node_modules dir
cp -r took
5 minutes
go app which do clonefile syscall took
17s
code:
import (
"golang.org/x/sys/unix"
)
func main() {
dir := "/Users/vadym/github/rpcpoc/node_modules"
err := unix.Clonefile(dir, "/Users/vadym/github/rpcpoc/node_modules6", 0)
if err != nil {
log.Fatalf("Failed to clone file: %v, path: %s", err, dir)
}
return
}
So, benefits compared to existing NM linker:
- much faster (if cache exists)
- smaller space usage.
Describe the drawbacks of your solution
- Requires NAPI or other native helper which can do syscalls. Node exposes copyFile syscall, but it does not work for dirs
const fs= require('fs');
fs.copyFile('/Users/vadym/github/rpcpoc/node_modules', '/Users/vadym/github/rpcpoc/node_modules10', fs.constants.COPYFILE_FICLONE_FORCE, console.log)
[Error: ENOSYS: function not implemented, copyfile '
- does not improve first unarchiving, only incremental (cache exists in fs). But it improves duplicates (if hoisting not solved it).
Describe alternatives you've considered
Fuse. Linux support is great, macos making fskit public this in 15.4. This is alternative track, which dramatically increases speed and improves disk usage.
Tagging @arcanis because this is design decision. I can write poc linker and run benchmarks. It could be external plugin, but it will share alot of logic with existing nm linker, which should be abstracted and exported in this case. Native helper could be used accelerating other things. E.g downloads and tar.gz->zip transforms. Thanks.
That sounds a worthwhile experiment. I think it was quickly discussed a couple of years ago and at the time my thinking was that this could lead to easy cache corruption (since people tend to edit their node_modules for debugging); still, I don't have anything against having an option for that.
(I misread hardlinks & reflinks - of note, "Requires NAPI or other native helper which can do syscalls", this isn't something possible today, as Yarn currently must be distributed as a single JS file)
This issue by @arcanis is related to this whole topic, so I link it here for the reference: https://github.com/libuv/libuv/issues/2936