filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

local caching with custom endpoint

Open ffalkenberg opened this issue 3 years ago • 4 comments

Hi, I am trying to use local caching with fsspec and a custom endpoint.

this works:

import os, fsspec
import numpy as np
from PIL import Image

os.environ['AWS_ACCESS_KEY_ID'] = '<...>'
os.environ['AWS_SECRET_ACCESS_KEY'] = '<...>'
os.environ['S3_ENDPOINT'] = 'http://<host>:<port>'

file = "s3://<bucket>/<file>.jpg"
with fsspec.open(file, "rb", client_kwargs={'endpoint_url': os.environ['S3_ENDPOINT']}) as f:
    img = np.array(Image.open(f))

this doesn't work:

file = "filecache::s3://<bucket>/<file>.jpg"
with fsspec.open(file, "rb", client_kwargs={'endpoint_url': os.environ['S3_ENDPOINT']}, filecache={"cache_storage": "/temp/cache"}) as f:
    img = np.array(Image.open(f))

EndpointConnectionError: Could not connect to the endpoint URL: "https://<bucket>.s3.amazonaws.com/<file>.jpg"

fsspec seems to use a default endpoint in a addition to virtual-hosted style url in the second instance. In the first instance it uses my custom endpoint and path-style url like i want.

How can i make this work for caching?

I am using fsspec-2021.11.1 s3fs-2021.11.1

ffalkenberg avatar Dec 14 '21 07:12 ffalkenberg

With that URL, you are essentially invoking two types of filesystem, so fsspec doesn't know which of them is to take the client_kwargs argument. You already did this for the filecache part, but not s3:

fsspec.open(file, "rb", 
    s3={"client_kwargs": {'endpoint_url': os.environ['S3_ENDPOINT']}}, 
    filecache={"cache_storage": "/temp/cache"})

martindurant avatar Dec 14 '21 14:12 martindurant

@martindurant ahh thank you alot, that explains it :)

another question if i may: Shouldn't the following work then as well?

file = "s3://<bucket>/<file>.jpg"
fsspec.open(file, "rb",
    s3={"client_kwargs": {'endpoint_url': os.environ['S3_ENDPOINT']}})

__init__() got an unexpected keyword argument 's3'

(I am trying to find a generic solution, because calls can come with caching urls or without)

ffalkenberg avatar Dec 15 '21 07:12 ffalkenberg

Yes, it is reasonable to expect that the second example should work as well. You can consider this an issue, and look in fsspec.core.get_fs_token_paths - where the one-fs case is treated differently from the multi-fs case. I don't know if there would be any situation of ambiguity, where a kwarg happens to have the name name as the protocol it relates to.

martindurant avatar Dec 15 '21 14:12 martindurant

okay, thank you

ffalkenberg avatar Dec 16 '21 08:12 ffalkenberg