Failure message: cp skipping file, as it was replaced while being copied
Workflow that I'm testing:
- Mount a google storage bucket with gcsfuse
- Copy a binary from that bucket to the current working directory
- Make that binary executable
- Run that binary
However, when I'm doing this for multiple simultaneous jobs, I occasionally get error messages like the following:
command--james--190429-094127-78.6 (attempt 1) failed. Retrying.
Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied
One possibility is that my working directory is within the mounted bucket, in which case copying the file to itself (by another simultaneous job) could trigger that. But that doesn't seem like what I'd expect, just from mounting a bucket. To clarify, what is the current working directory by default when a job is launched in dsub?
(Current working directory is /mnt/data/workingdir, so one hypothesis as to why this occurred -- that I was somehow working in a mounted bucket -- didn't pan out.)
Finally, I should add that, with a highly parallel workload, this affects the majority of my jobs. If this seems more like a gcsfuse-specific issue, I can move this over there, but right now it's not clear to me.
command--james--190429-105744-35.145 (attempt 1) failed. Retrying.
Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied
chmod: cannot access './my-executable': No such file or directory
/mnt/data/script/command.sh: line 5: ./my-executable: No such file or directory
command--james--190429-105744-35.146 (attempt 1) failed. Retrying.
Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied
chmod: cannot access './my-executable': No such file or directory
/mnt/data/script/command.sh: line 5: ./my-executable: No such file or directory
command--james--190429-105744-35.147 (attempt 1) failed. Retrying.
Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied
chmod: cannot access './my-executable': No such file or directory
/mnt/data/script/command.sh: line 5: ./my-executable: No such file or directory
If I throw in a sleep 15 prior to doing my copy, I reduce the number of errors by about 5-fold, making me think this is a gcsfuse issue. Still, since I'm not sure if it's an intrinsic gcsfuse issue or an issue with the parameters used to invoke it from dsub, will leave this here for now and can open it on their repo if you all think that would be more appropriate.
Final(?) piece here: I'm now reading cp's exit status. If it isn't 0, then I'm sleeping 30 seconds and trying again. So far, this has yielded zero errors. Makes me more suspicious that this is related to gcsfuse.
Hi @carbocation !
Can you explain the use case to copy the binary from a fuse-mounted bucket versus:
1- Putting the file in a docker image 2- Pulling the file as a --input parameter?
Fuse support is added only recently and with some hesitation. GCSfuse may have value in specific cases, but we have more often seen people test it out for various uses (outside of workflows) and decide that it was a fragile solution.
I'd suggest using one of the methods provided in the Scripts, Commands, and Docker documentation.
Thanks. Re the 2 points:
- The file works fine in a docker image, but it will be easier to coach people in the lab if they don't have to create their own Docker. (Basically, too much friction.)
- I like the point about using the command as an --input parameter. That's probably a better solution! Will go with that.
Should I raise this issue that I'm seeing with gcsfuse and close this issue here?
If you are able to create a reproducible test case that demonstrates clearly that the file is in fact not being modified while you are trying to copy it, then it seems worth filing an issue with gcsfuse.
I'm going to leave this issue open for now as I think we should update the dsub documentation to more clearly indicate that using the bucket --mount flag should be done only when you've really proved out that the standard --input/--output mechanisms are insufficient.
Thanks.
For future audiences, approach #2 is very quick (though less reproducible than a properly versioned Docker).
In a pinch, I can pass --input BINARY="gs://path/to/binary" and in my command, I start with:
chmod +x ${BINARY}
and then I'm good to go. For the most reproducible research, and for programs that aren't simply a single compiled binary, a versioned Docker is the way to go.
you must recreate your source file from another gcsfuse.
this error actually because gcsfuse has a "cache" about a file, this cache has a default timeout (1s), when you cp file1 file2, cp command first stat file1 file, that maybe hit the cache, and then open file1 file, then stat file1 file again, then cache maybe timeout, this stat will get file1's info from backend storage, cp command then compare two stat's result, if it's not same, that means, file1 maybe recreated, then cp command will return error message " skipping file, as it was replaced while being copied".