filesystem_spec
filesystem_spec copied to clipboard
local caching with custom endpoint
Hi, I am trying to use local caching with fsspec and a custom endpoint.
this works:
import os, fsspec
import numpy as np
from PIL import Image
os.environ['AWS_ACCESS_KEY_ID'] = '<...>'
os.environ['AWS_SECRET_ACCESS_KEY'] = '<...>'
os.environ['S3_ENDPOINT'] = 'http://<host>:<port>'
file = "s3://<bucket>/<file>.jpg"
with fsspec.open(file, "rb", client_kwargs={'endpoint_url': os.environ['S3_ENDPOINT']}) as f:
img = np.array(Image.open(f))
this doesn't work:
file = "filecache::s3://<bucket>/<file>.jpg"
with fsspec.open(file, "rb", client_kwargs={'endpoint_url': os.environ['S3_ENDPOINT']}, filecache={"cache_storage": "/temp/cache"}) as f:
img = np.array(Image.open(f))
EndpointConnectionError: Could not connect to the endpoint URL: "https://<bucket>.s3.amazonaws.com/<file>.jpg"
fsspec seems to use a default endpoint in a addition to virtual-hosted style url in the second instance. In the first instance it uses my custom endpoint and path-style url like i want.
How can i make this work for caching?
I am using fsspec-2021.11.1 s3fs-2021.11.1
With that URL, you are essentially invoking two types of filesystem, so fsspec doesn't know which of them is to take the client_kwargs argument. You already did this for the filecache part, but not s3:
fsspec.open(file, "rb",
s3={"client_kwargs": {'endpoint_url': os.environ['S3_ENDPOINT']}},
filecache={"cache_storage": "/temp/cache"})
@martindurant ahh thank you alot, that explains it :)
another question if i may: Shouldn't the following work then as well?
file = "s3://<bucket>/<file>.jpg"
fsspec.open(file, "rb",
s3={"client_kwargs": {'endpoint_url': os.environ['S3_ENDPOINT']}})
__init__() got an unexpected keyword argument 's3'
(I am trying to find a generic solution, because calls can come with caching urls or without)
Yes, it is reasonable to expect that the second example should work as well. You can consider this an issue, and look in fsspec.core.get_fs_token_paths - where the one-fs case is treated differently from the multi-fs case. I don't know if there would be any situation of ambiguity, where a kwarg happens to have the name name as the protocol it relates to.
okay, thank you