miniwdl icon indicating copy to clipboard operation
miniwdl copied to clipboard

Problem with progressive uploads interracting with call caching

Open morsecodist opened this issue 1 year ago • 1 comments

For swipe we maintain our own version of the progressive uploader that also uploads the call cache. With newer versions this is no longer working (I will expand on the specific versions and issues below). Though this is only breaking in our complex plugin I think this issue would happen when using local call caching with progressive S3 uploads in general. The issue doesn't seem to be with our cache S3 sync.

Upgrading from v1.5.2 to v1.5.3 causes runs with cache hits to fail with a file not found error for the cache result file.

Upgrading from v1.5.3 to v1.5.4 fixes that issue and now the run completes without error but the re-mapping of paths for the output json doesn't happen for the cache output. This is because when listing the output directory for some reason the cached output no longer shows up. I suspect this has been the case since v1.5.3.

From the commit history it seems like this commit from v1.5.3 is the source of the first issue and this commit from v1.5.4 fixed it partially, leaving behind just the re-mapping issue.

morsecodist avatar Oct 06 '22 19:10 morsecodist

@morsecodist I haven't yet figured out the exact problem, but here's some context for those changes that might help us zero in.

  • First, it's better to pretend the two commits you linked as one at this point; the first one was basically broken and the second one was to unbreak it.
  • Prior to this diff, if a task or workflow outputs a URI for a File output, that File output was not represented in the out/ symlinks folder, because the URI was obviously not a local file that os.path.exists().
  • The diff checks whether the download cache contains a local copy of the URI, and if so includes a symlink to it under out/ as usual.

With that summary, here are a couple of observations that might help us connect it with the uploader issues, even though it's not clear to me yet:

  • The behavior changes only where tasks or workflows are outputting File URIs, which is generally an unusual thing to do.
  • However I think SWIPE call caching makes it more common, i.e. when we get a call cache hit, we generate outputs with the cached S3 URIs right?
  • Then we have links under out/ that refer to a local copy of an already-cached file, possibly (?) leading us to re-upload it unnecessarily and/or causing other confusion.

I'm having a hard time connecting this with your observation that "when listing the output directory for some reason the cached output no longer shows up". I can't see a case where an output should show up under out/ prior to the v1.5.4 diff but not after; the goal was only to add links that had previously been missing. But there might be something confusing happening with an already-cached task (is that what we're dealing with? or are we failing to remap the paths on a cache "miss"?).

mlin avatar Oct 10 '22 10:10 mlin