`OverwriteNewerLocalError` when reading same resource in parallel
This appears to be similar to #128, with two caveats:
- This happens despite using the workaround mentioned by @jayqi in that issue
- There is no writing happening, only reading
The relevant code is:
# self.image_path is a cloud path
with self.image_path.open("rb") as file:
image_contents = typing.cast(np.ndarray, imageio.imread(file))
This line raises the following exception:
cloudpathlib.exceptions.OverwriteNewerLocalError: Local file (...) for cloud path (...) is newer on disk, but is being requested for download from cloud. Either (1) push your changes to the cloud, (2) remove the local file, or (3) pass `force_overwrite_from_cloud=True` to overwrite.
This is despite the fact the program does not write to the file or even open it in 'w' mode.
What may be relevant is that the code is part of an HTTP server which receives several requests in parallel, and it often needs to read the same files for different requests.
Since reading the image file may be a lengthy process which probably uses C code extensively, I think it's possible open gets called from another thread before the with clause is released, which causes the issue.
Is this a known issue? Does the explanation make sense, or am I missing something else?
@Gilthans It would be helpful to have a minimally reproducible example here so we can dig in.
To me, it seems most likely similar to #49, not #128. We can potentially address things like this by not using time as our check (for example, like #12) or turning off cache checks entirely. There may be other workarounds as well (e.g, a flag to never re-download from the cloud, adding sleeps to your code on first download, manually making sure the mtime matches the cloud version, explicitly managing the download/caching with download_to and exists checks and passing around local paths on your server).
I'm encountering this too in 0.13.0 as part of a HTTP flask web app.
Thanks for mentioning @bdc34.
I think this will continue to be an issue until we implement a parallel-safe caching layer that's independent of the file system (related issues #9, #11, #12, #128).
Here are a few mitigation strategies that might be helpful:
- Architect your parallelism to ensure you create a new
Clienton each thread/process and make sure they have a differentlocal_cache_dirpassed to them. This will mean independent caches per worker, but the disk space tradeoff may be worth the simplicity of the implementation. - If your application is just passing the file on the the end user, use a presigned url instead of passing the file through your backend. It would be good to get #236 in to support this generally, but you can do it in the meantime with something like
S3Client.client.generate_presigned_urlas shown in that PR.
Finally, if someone can provide a minimal code-snippet that reproduces this problem consistently there may be additional mitigations that we can build into the cloudpathlib library for this use case.
I forgot to mention this here, but I tried to create a reproducible example with no luck. I might give it another shot later on