cue icon indicating copy to clipboard operation
cue copied to clipboard

cmd/cue: CreateFile "Access is denied" errors on Windows with modcache partial files

Open mvdan opened this issue 1 year ago • 2 comments

This CI flake has shown up three times in the past few weeks, so it's time we track it and try to fix it.

 --- FAIL: TestScript (1.42s)
    --- FAIL: TestScript/registry_submodule (0.10s)
        testscript.go:585: # This test tests that submodules work OK. (0.070s)
            > exec cue eval .
            [stderr]
            main.org@v0: import failed: cannot find package "example.com/foo@v0": cannot fetch [email protected]: CreateFile $WORK\.tmp\cache\mod\download\example.com\@v\v0.0.1.partial: Access is denied.:
                ./main.cue:4:2
            [exit status 1]
            FAIL: testdata\script\registry_submodule.txtar:3: unexpected command failure
            
FAIL
FAIL	cuelang.org/go/cmd/cue/cmd	26.132s

cc @rogpeppe

mvdan avatar Aug 30 '24 14:08 mvdan

https://github.com/golang/go/issues/58697 looks extremely similar.

Note that our modcache package was imported from Go fairly recently, so it should already be pretty robust.

mvdan avatar Aug 30 '24 14:08 mvdan

Happened again today on https://review.gerrithub.io/c/cue-lang/cue/+/1214081.

mvdan avatar Apr 25 '25 14:04 mvdan

Just happened again on a master commit: https://github.com/cue-lang/cue/actions/runs/17579000657/job/49930787837

mvdan avatar Sep 09 '25 10:09 mvdan

The error seems to come from os.WriteFile in our modcache package:

	// Extract the module zip directory at its final location.
	//
	// To prevent other processes from reading the directory if we crash,
	// create a .partial file before extracting the directory, and delete
	// the .partial file afterward (all while holding the lock).
	//
	// A technique used previously was to extract to a temporary directory with a random name
	// then rename it into place with os.Rename. On Windows, this can fail with
	// ERROR_ACCESS_DENIED when another process (usually an anti-virus scanner)
	// opened files in the temporary directory.
	if err := os.MkdirAll(parentDir, 0777); err != nil {
		return module.SourceLoc{}, err
	}
	if err := os.WriteFile(partialPath, nil, 0666); err != nil {
		return module.SourceLoc{}, err
	}
	if err := modzip.Unzip(dir, mv, zipfile); err != nil {

My initial guess was that we should be using robustio for this file creation, because it knows what OS errors are ephemeral, like ERROR_ACCESS_DENIED on Windows. However, upstream Go seems to not do this; their modfetch package has:

	// Extract the module zip directory at its final location.
	//
	// To prevent other processes from reading the directory if we crash,
	// create a .partial file before extracting the directory, and delete
	// the .partial file afterward (all while holding the lock).
	//
	// Before Go 1.16, we extracted to a temporary directory with a random name
	// then renamed it into place with os.Rename. On Windows, this failed with
	// ERROR_ACCESS_DENIED when another process (usually an anti-virus scanner)
	// opened files in the temporary directory.
	//
	// Go 1.14.2 and higher respect .partial files. Older versions may use
	// partially extracted directories. 'go mod verify' can detect this,
	// and 'go clean -modcache' can fix it.
	if err := os.MkdirAll(parentDir, 0777); err != nil {
		return "", err
	}
	if err := os.WriteFile(partialPath, nil, 0666); err != nil {
		return "", err
	}
	if err := modzip.Unzip(dir, mod, zipfile); err != nil {

Interestingly, upstream Go doesn't seem to see this error very often. I could not find an open issue matching "partial" and "access denied" that has shown a failure in the last year, yet we have seen three failures in the last year already.

My intuition was to replace os.WriteFile with some sort of "robust" API we copied from Go. However, renameio.WriteFile was deleted as it was not safe and it did not really solve any problem completely. Similarly, they stopped exposing an IsEphemeralError, which seems to suggest we shouldn't be retrying here.

It's still unclear to me why we're the only ones running into this issue, or how to resolve it.

mvdan avatar Sep 09 '25 10:09 mvdan

Worth noting that Namespace said that they disable the Windows built-in anti-virus for the C: drive, which is where testscript places temporary directories, including the module cache causing the error above.

mvdan avatar Sep 10 '25 12:09 mvdan