kvikio icon indicating copy to clipboard operation
kvikio copied to clipboard

403 with pre-signed S3 URL

Open TomAugspurger opened this issue 10 months ago • 2 comments

S3 supports pre-signed URLs, a way to encode authorization into the URL so that they can be shared and used similarly to a public HTTP URL. Currently, it looks like they are not supported. A pre-signed URL can be generated through the console, CLI, or SDKs:

In [1]: import boto3

In [2]: import boto3, httpx

In [3]: s3 = boto3.client("s3")

In [4]: url = s3.generate_presigned_url("get_object", Params={"Bucket": "kvikiobench-56481", "Key": "data/small/0000"}, ExpiresIn=600)

In [5]: httpx.get(url).status_code
Out[5]: 200

If we take that url and use it with kvikio, we get a 403 error:

>>> import kvikio
>>> kvikio.RemoteFile.open_http(url="https://kvikiobench-56481.s3.us-east-2.amazonaws.com/data/small/0000?response-content-disposition=inline&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Security-Token=...")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[3], line 1
----> 1 kvikio.RemoteFile.open_http(url="https://kvikiobench-56481.s3.us-east-2.amazonaws.com/data/small/0000?response-content-disposition=inline&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Security-Token=...")

File /raid/toaugspurger/envs/kvikio-env/lib/python3.12/site-packages/kvikio/remote_file.py:69, in RemoteFile.open_http(cls, url, nbytes)
     53 @classmethod
     54 def open_http(
     55     cls,
     56     url: str,
     57     nbytes: Optional[int] = None,
     58 ) -> RemoteFile:
     59     """Open a http file.
     60
     61     Parameters
   (...)
     67         for the file size.
     68     """
---> 69     return RemoteFile(_get_remote_module().RemoteFile.open_http(url, nbytes))

File remote_handle.pyx:92, in kvikio._lib.remote_handle.RemoteFile.open_http()

File remote_handle.pyx:81, in kvikio._lib.remote_handle.RemoteFile._from_endpoint()

RuntimeError: curl_easy_perform() error near /opt/conda/conda-bld/work/cpp/src/remote_handle.cpp:47(The requested URL returned error: 403)

TomAugspurger avatar Jan 13 '25 13:01 TomAugspurger

The problem is that the presigned URLs doesn't support HEAD thus KvikIO fails when trying to get the file size.

It should work when setting the file size manually:

import kvikio
kvikio.RemoteFile.open_http(url="presigned-aws-url", bytes=100)

madsbk avatar Jan 21 '25 14:01 madsbk

Ah, thanks for tracking that down.

TomAugspurger avatar Jan 21 '25 14:01 TomAugspurger