delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Use Aliyun OSS as storage backend

Open Veiasai opened this issue 11 months ago • 2 comments

Description

Though I think OSS is compatible with AWS S3. However, I tried that in python deltalake and encountered some auth issues.

I wonder how could I turn on verbose logging?

(by the way, aws-cli works well after I configure endpoint/region/credentials. I did same change in deltalake storage_options)

Use Case

Related Issue(s)

Veiasai avatar Mar 29 '24 05:03 Veiasai

@Veiasai add this env variable: RUST_LOG='debug'

ion-elgreco avatar Mar 29 '24 07:03 ion-elgreco

[2024-03-29T09:37:25Z DEBUG deltalake_aws] S3LogStoreFactory has been asked to create a LogStore without the dynamodb locking provider
[2024-03-29T09:37:25Z DEBUG reqwest::connect] starting new connection: https://oss-cn-hangzhou.aliyuncs.com/
[2024-03-29T09:37:25Z DEBUG hyper::client::connect::dns] resolving host="oss-cn-hangzhou.aliyuncs.com"
[2024-03-29T09:37:25Z DEBUG hyper::client::connect::http] connecting to 118.31.219.236:443
[2024-03-29T09:37:25Z DEBUG hyper::client::connect::http] connected to 118.31.219.236:443
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] No cached session for DnsName("oss-cn-hangzhou.aliyuncs.com")
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] Not resuming any session
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] ALPN protocol is Some(b"http/1.1")
[2024-03-29T09:37:25Z DEBUG rustls::client::hs] Using ciphersuite TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
[2024-03-29T09:37:25Z DEBUG rustls::client::tls12::server_hello] Server supports tickets
[2024-03-29T09:37:25Z DEBUG rustls::client::tls12] ECDHE curve is ECParameters { curve_type: NamedCurve, named_group: X25519 }
[2024-03-29T09:37:25Z DEBUG rustls::client::tls12] Server DNS name is DnsName("oss-cn-hangzhou.aliyuncs.com")
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] flushed 509 bytes
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] parsed 7 headers
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body is content-length (1672 bytes)
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body completed
[2024-03-29T09:37:25Z DEBUG hyper::client::pool] pooling idle connection for ("https", oss-cn-hangzhou.aliyuncs.com)
[2024-03-29T09:37:25Z DEBUG hyper::client::pool] reuse idle connection for ("https", oss-cn-hangzhou.aliyuncs.com)
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] flushed 3924 bytes
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::io] parsed 8 headers
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body is content-length (374 bytes)
[2024-03-29T09:37:25Z DEBUG hyper::proto::h1::conn] incoming body completed
[2024-03-29T09:37:25Z DEBUG hyper::client::pool] pooling idle connection for ("https", oss-cn-hangzhou.aliyuncs.com)
[2024-03-29T09:37:25Z DEBUG rustls::common_state] Sending warning alert CloseNotify

hmm it doesn't show the raw http request.

Veiasai avatar Mar 29 '24 11:03 Veiasai

You can use deltalake python package with aliyun oss by setting following environment variables, (replace region with your bucket region. e.g. cn-beijing):

export AWS_ACCESS_KEY_ID=<YOU FILL IT>
export AWS_SECRET_ACCESS_KEY=<YOU FILL IT>
export AWS_ENDPOINT_URL=https://<YOU_BUCKET>.oss-<region>.aliyuncs.com
export AWS_VIRTUAL_HOSTED_STYLE_REQUEST=true
export AWS_COPY_IF_NOT_EXISTS=header-with-status:x-oss-forbid-overwrite:true:409
export AWS_REGION=<region> # like cn-beijing

If you want to use HTTP and gain some extra performance

export AWS_ALLOW_HTTP=1
export AWS_ENDPOINT_URL=http://<YOU_BUCKET>.oss-<region>.aliyuncs.com

You can also use <YOU_BUCKET>.oss-<region>-internal.aliyuncs.com when in the same region.

pandada8 avatar Jun 18 '24 17:06 pandada8