cloudpathlib icon indicating copy to clipboard operation
cloudpathlib copied to clipboard

Allow passing `local_cache_dir` to `CloudPath` init

Open pjbull opened this issue 5 years ago • 2 comments

In order to use a persistent local cache dir, we have to pass a Client instance:

ladi = CloudPath(
   "s3://ladi/Images/FEMA_CAP/2020/70349",
   S3Client(local_cache_dir="data")
)

This works ok, but has two limitations:

  • You have to know to import and instantiate a Client, whereas most of our core functionality is available without having to think about the client at all, which is nice.
  • You have to know what Client you want ahead of time since there is no generic Client

Ideally, we'd have a happy path that could be used like this:

ladi = CloudPath(
   "s3://ladi/Images/FEMA_CAP/2020/70349",
   local_cache_dir="data"
)

pjbull avatar Oct 04 '20 20:10 pjbull

I'm unsure that this is what we'd want to do. I think local_cache_dir should be set at a client level. You want all cloud paths you create by default to refer to the same cache.

If we accept that, then it's confusing what it means to pass in a cache at the CloudPath level. What happens if you do this?

CloudPath(
   "s3://ladi/Images/FEMA_CAP/2020/70349",
   local_cache_dir="data"
)
CloudPath("s3://ladi/Images/FEMA_CAP/2020/70349")

Would the second invocation of the same path give you the same cache location?

jayqi avatar Oct 04 '20 21:10 jayqi

Would the second invocation of the same path give you the same cache location?

What would you expect in the example you gave?

To me, the happy path seems like this: When I create a CloudPath with an explicit cache dir, I expect that path and any paths that it is involved in creating (e.g., through .parent, /, glob, etc.) to have the same cache.

If I am instantiating a new object with different parameters (where I don't pass the cache), I wouldn't expect that object to have the same cache dir.

If you want the exact same thing to happen any time you use CloudPath(), I believe we support this already through explicitly calling set_as_default_client.

I agree someone could confuse themselves, but I think it's more likely to be useful than confusing. E.g., I think that we'll use it all the time in notebooks in our projects.

pjbull avatar Oct 04 '20 22:10 pjbull

In think with this discussion and the implementations discussed in #10 this can be closed as won't fix.

pjbull avatar Dec 19 '22 00:12 pjbull