berry icon indicating copy to clipboard operation
berry copied to clipboard

[Feature] Reflink NM linker

Open goloveychuk opened this issue 10 months ago • 4 comments

  • [x] I'd be willing to implement this feature (contributing guide)
  • [ ] This feature is important to have in this repository; a contrib plugin wouldn't do

Describe the user story

Slow node_modules linker.

Describe the solution you'd like

So I've found orogene package mananger, which uses interesting technic. https://github.com/orogene/orogene/blob/2dc8d9e9d32b9dcc8e8a33e8a729c2c08772c33f/crates/nassun/src/tarball.rs#L443

So when unarchiving tar, it stores files first to cache dir. It's treaded as immutable. Then it's "cloned" to real node_modules dir to project via "reflinks", which works on COW file systems (including APFS) TLDR idea is it's creating a new reference to existing blocks instead of doing data write.

I've run some experimentes in macos, m1 mac. 5 gb node_modules dir cp -r took 5 minutes go app which do clonefile syscall took 17s

code:

import (
	"golang.org/x/sys/unix"
)

func main() {
	dir := "/Users/vadym/github/rpcpoc/node_modules"
	err := unix.Clonefile(dir, "/Users/vadym/github/rpcpoc/node_modules6", 0)
	if err != nil {
		log.Fatalf("Failed to clone file: %v, path: %s", err, dir)
	}
	return
}

So, benefits compared to existing NM linker:

  1. much faster (if cache exists)
  2. smaller space usage.

Describe the drawbacks of your solution

  1. Requires NAPI or other native helper which can do syscalls. Node exposes copyFile syscall, but it does not work for dirs
const fs= require('fs');

fs.copyFile('/Users/vadym/github/rpcpoc/node_modules', '/Users/vadym/github/rpcpoc/node_modules10', fs.constants.COPYFILE_FICLONE_FORCE, console.log)

[Error: ENOSYS: function not implemented, copyfile '

  1. does not improve first unarchiving, only incremental (cache exists in fs). But it improves duplicates (if hoisting not solved it).

Describe alternatives you've considered

Fuse. Linux support is great, macos making fskit public this in 15.4. This is alternative track, which dramatically increases speed and improves disk usage.

goloveychuk avatar Mar 15 '25 17:03 goloveychuk

Tagging @arcanis because this is design decision. I can write poc linker and run benchmarks. It could be external plugin, but it will share alot of logic with existing nm linker, which should be abstracted and exported in this case. Native helper could be used accelerating other things. E.g downloads and tar.gz->zip transforms. Thanks.

goloveychuk avatar Mar 16 '25 17:03 goloveychuk

That sounds a worthwhile experiment. I think it was quickly discussed a couple of years ago and at the time my thinking was that this could lead to easy cache corruption (since people tend to edit their node_modules for debugging); still, I don't have anything against having an option for that.

arcanis avatar Jun 10 '25 12:06 arcanis

(I misread hardlinks & reflinks - of note, "Requires NAPI or other native helper which can do syscalls", this isn't something possible today, as Yarn currently must be distributed as a single JS file)

arcanis avatar Jun 10 '25 12:06 arcanis

This issue by @arcanis is related to this whole topic, so I link it here for the reference: https://github.com/libuv/libuv/issues/2936

larixer avatar Jun 10 '25 12:06 larixer