cloudpathlib
cloudpathlib copied to clipboard
Add `dirs_exist_ok` to `CloudPath.copytree`
closes #144
Adds dirs_exist_ok argument to CloudPath.copytree
The original issue suggests adding the dirs_exist_ok functionality through mkdir. Using mkdir on S3Path doesn't actually create a directory, since we can't make empty directories in cloud storage. Adding functionality to a no-op mkdir seemed misleading, so the new argument doesn't use mkdir to check if the destination directory exists. Instead the functionality is added to copytree outside of mkdir with the more explicit if destination.exists() and not dirs_exist_ok.
Questions
-
Since
.mkdirdoesn't work on eitherGSPathorS3Path, should we raise an implementation error when it is called? Currently it runs without error, but doesn't actually create a directory.- For example, the
destination.mkdirline inCloudPath.copytreedoesn't actually create a cloud directory if it doesn't exist. The directory is created when the first file is copied withsubpath.copy(...). It seems like we need this line just for cases when the destination directory is local, is that right? - If we raise an implementation error instead of pass without doing anything, the new
S3Path.mkdirdefinition would be:
def mkdir(self, parents=False, exist_ok=False): raise CloudPathNotImplementedError( "Empties directories cannot be created in cloud storage" )- We can add something similar, but with a general
NotImplementedError, toGSPath.mkdir
- For example, the
-
There is a note in the comments that empty directories can't be created in other forms of cloud storage (
GSPath) either. It seems like we can handleGSPathandS3Pathsimilarly for addingdirs_exist_ok, is that right?
🚀 Deployed on https://deploy-preview-281--gallant-agnesi-5f7bb3.netlify.app
Codecov Report
Merging #281 (25c4ce7) into master (fa8d9fd) will decrease coverage by
0.3%. The diff coverage is100.0%.
Additional details and impacted files
@@ Coverage Diff @@
## master #281 +/- ##
========================================
- Coverage 94.8% 94.4% -0.4%
========================================
Files 20 20
Lines 1318 1320 +2
========================================
- Hits 1250 1247 -3
- Misses 68 73 +5
| Impacted Files | Coverage Δ | |
|---|---|---|
| cloudpathlib/cloudpath.py | 93.6% <100.0%> (+<0.1%) |
:arrow_up: |
| cloudpathlib/s3/s3client.py | 92.9% <0.0%> (-2.7%) |
:arrow_down: |
| cloudpathlib/gs/gsclient.py | 92.8% <0.0%> (-1.8%) |
:arrow_down: |
The original issue suggests adding the dirs_exist_ok functionality through mkdir. Using mkdir on S3Path doesn't actually create a directory, since we can't make empty directories in cloud storage. Adding functionality to a no-op mkdir seemed misleading, so the new argument doesn't use mkdir to check if the destination directory exists.
This actually surfaces a tricky issue that I hadn't thought of before. Currently, because mkdir is implemented as a no-op it doesn't match pathlib's behavior. By default, pathlib's mkdir has exists_ok=False. Accordingly, the matching behavior should be for mkdir to—by default—check if the "directory" "exists" in cloud storage, and error if it already does. It seems like there'd be some performance penalty for this. Thoughts, @pjbull?
Since .mkdir doesn't work on either GSPath or S3Path, should we raise an implementation error when it is called? Currently it runs without error, but doesn't actually create a directory.
Definitely not. This is intentionally a no-op so we can copy pathlib's implementation logic to our cloud implementations and not have to make changes to take out all of the mkdir calls.
Going through some stale PRs to get them moving.
I think that we shouldn't introduce dirs_exist_ok=False as the default without a deprecation cycle. There's too much of a chance people use copytree relying on the current behavior.
I think I'd go with an implementation that looks something like:
def copytree(..., dirs_exist_ok: bool = None) -> Union[Path, "CloudPath"]:
if dirs_exist_ok is None:
warnings.warn("In a future version, `CloudPath.copytree` will have a `dirs_exist_ok=False` kwarg. Please specify `dirs_exist_ok=True` in your code if you do not want that behavior (issue #144).", CloudPathDeprecationWarning)
dirs_exist_ok = True # match current behavior until dirs_exist_ok=False introduced
...