dsub icon indicating copy to clipboard operation
dsub copied to clipboard

Failure message: cp skipping file, as it was replaced while being copied

Open carbocation opened this issue 6 years ago • 9 comments

Workflow that I'm testing:

  1. Mount a google storage bucket with gcsfuse
  2. Copy a binary from that bucket to the current working directory
  3. Make that binary executable
  4. Run that binary

However, when I'm doing this for multiple simultaneous jobs, I occasionally get error messages like the following:

command--james--190429-094127-78.6 (attempt 1) failed. Retrying.
  Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied

One possibility is that my working directory is within the mounted bucket, in which case copying the file to itself (by another simultaneous job) could trigger that. But that doesn't seem like what I'd expect, just from mounting a bucket. To clarify, what is the current working directory by default when a job is launched in dsub?

carbocation avatar Apr 29 '19 13:04 carbocation

(Current working directory is /mnt/data/workingdir, so one hypothesis as to why this occurred -- that I was somehow working in a mounted bucket -- didn't pan out.)

carbocation avatar Apr 29 '19 14:04 carbocation

Finally, I should add that, with a highly parallel workload, this affects the majority of my jobs. If this seems more like a gcsfuse-specific issue, I can move this over there, but right now it's not clear to me.

  command--james--190429-105744-35.145 (attempt 1) failed. Retrying.
  Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied
chmod: cannot access './my-executable': No such file or directory
/mnt/data/script/command.sh: line 5: ./my-executable: No such file or directory

  command--james--190429-105744-35.146 (attempt 1) failed. Retrying.
  Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied
chmod: cannot access './my-executable': No such file or directory
/mnt/data/script/command.sh: line 5: ./my-executable: No such file or directory

  command--james--190429-105744-35.147 (attempt 1) failed. Retrying.
  Failure message: cp: skipping file '/mnt/data/mount/gs/my_bucket/projects/jamesp/bin/my-executable', as it was replaced while being copied
chmod: cannot access './my-executable': No such file or directory
/mnt/data/script/command.sh: line 5: ./my-executable: No such file or directory

carbocation avatar Apr 29 '19 15:04 carbocation

If I throw in a sleep 15 prior to doing my copy, I reduce the number of errors by about 5-fold, making me think this is a gcsfuse issue. Still, since I'm not sure if it's an intrinsic gcsfuse issue or an issue with the parameters used to invoke it from dsub, will leave this here for now and can open it on their repo if you all think that would be more appropriate.

carbocation avatar Apr 29 '19 15:04 carbocation

Final(?) piece here: I'm now reading cp's exit status. If it isn't 0, then I'm sleeping 30 seconds and trying again. So far, this has yielded zero errors. Makes me more suspicious that this is related to gcsfuse.

carbocation avatar Apr 29 '19 16:04 carbocation

Hi @carbocation !

Can you explain the use case to copy the binary from a fuse-mounted bucket versus:

1- Putting the file in a docker image 2- Pulling the file as a --input parameter?

Fuse support is added only recently and with some hesitation. GCSfuse may have value in specific cases, but we have more often seen people test it out for various uses (outside of workflows) and decide that it was a fragile solution.

I'd suggest using one of the methods provided in the Scripts, Commands, and Docker documentation.

mbookman avatar Apr 29 '19 17:04 mbookman

Thanks. Re the 2 points:

  1. The file works fine in a docker image, but it will be easier to coach people in the lab if they don't have to create their own Docker. (Basically, too much friction.)
  2. I like the point about using the command as an --input parameter. That's probably a better solution! Will go with that.

Should I raise this issue that I'm seeing with gcsfuse and close this issue here?

carbocation avatar Apr 29 '19 17:04 carbocation

If you are able to create a reproducible test case that demonstrates clearly that the file is in fact not being modified while you are trying to copy it, then it seems worth filing an issue with gcsfuse.

I'm going to leave this issue open for now as I think we should update the dsub documentation to more clearly indicate that using the bucket --mount flag should be done only when you've really proved out that the standard --input/--output mechanisms are insufficient.

Thanks.

mbookman avatar Apr 29 '19 17:04 mbookman

For future audiences, approach #2 is very quick (though less reproducible than a properly versioned Docker).

In a pinch, I can pass --input BINARY="gs://path/to/binary" and in my command, I start with:

chmod +x ${BINARY}

and then I'm good to go. For the most reproducible research, and for programs that aren't simply a single compiled binary, a versioned Docker is the way to go.

carbocation avatar Apr 29 '19 18:04 carbocation

you must recreate your source file from another gcsfuse.

this error actually because gcsfuse has a "cache" about a file, this cache has a default timeout (1s), when you cp file1 file2, cp command first stat file1 file, that maybe hit the cache, and then open file1 file, then stat file1 file again, then cache maybe timeout, this stat will get file1's info from backend storage, cp command then compare two stat's result, if it's not same, that means, file1 maybe recreated, then cp command will return error message " skipping file, as it was replaced while being copied".

wgqimut avatar Jul 29 '22 08:07 wgqimut