dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

S3 snapshots does not work on non-AWS S3?

Open nyxi opened this issue 1 year ago • 9 comments

Describe the bug Dragonfly logs "InternalError" for everything S3.

Looking on the receiving end, my S3 service, it seems that Dragonfly makes HTTP requests to: https://fqdn/https://fqdn/bucket/file which of course does not work.

Log excerpts from startup and after a BGSAVE:

I20240521 12:20:35.599390     1 dfly_main.cc:646] Starting dragonfly df-v1.18.1-6851a4c845625b0b14bb145177322dafbbc9858e
(...)
I20240521 12:20:35.727556     9 snapshot_storage.cc:185] Creating AWS S3 client; region=us-east-1; https=true; endpoint=swift.elastx.cloud
I20240521 12:20:35.727806     9 credentials_provider_chain.cc:28] aws: disabled EC2 metadata
I20240521 12:20:35.730230     9 credentials_provider_chain.cc:36] aws: loaded credentials; provider=environment
I20240521 12:20:35.738584    10 snapshot_storage.cc:242] Load snapshot: Searching for snapshot in S3 path: s3://dragonfly-juicefs/
E20240521 12:21:01.488096     1 server_family.cc:816] Failed to load snapshot: Failed list objects in S3 bucket: InternalError
(...)
E20240521 14:06:23.187770     9 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InternalError
E20240521 14:06:48.932502    10 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InternalError
E20240521 14:06:48.940094     9 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InternalError
E20240521 14:06:48.940392     8 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InternalError
E20240521 14:06:48.944156    11 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InternalError
I20240521 14:06:48.944537    10 server_family.cc:1720] Error in BgSaveFb: Input/output error: Failed to open write file

To Reproduce S3 credentials in environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

S3 endpoint in environment variable DFLY_s3_endpoint (in my case swift.elastx.cloud).

Snapshot dir set in environment variable DFLY_dir (in my case s3://dragonfly-juicefs).

Expected behavior S3 snapshots to work.

Environment (please complete the following information):

  • Kubernetes, Dragonfly Operator v1.1.2 with Dragonfly v1.18.1

Additional context S3 API for Openstack Swift - not AWS

nyxi avatar May 22 '24 08:05 nyxi

experiencing the same/similar issue with Cloudflare's R2:

I20240522 17:17:10.716820    12 snapshot_storage.cc:185] Creating AWS S3 client; region=us-east-1; https=true; endpoint=https://<account_id>.r2.cloudflarestorage.com
I20240522 17:17:10.716926    12 credentials_provider_chain.cc:28] aws: disabled EC2 metadata
I20240522 17:17:10.718613    12 credentials_provider_chain.cc:36] aws: loaded credentials; provider=environment
I20240522 17:17:10.723218    13 snapshot_storage.cc:242] Load snapshot: Searching for snapshot in S3 path: s3://dflydb-prod/
W20240522 17:17:10.743796    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:10.745679    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:10.800523    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:10.906309    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:11.112648    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:11.518488    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:12.327304    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:13.933485    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:17.150838    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:23.564281    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
W20240522 17:17:36.376405    13 http_client.cc:261] aws: http client: failed to resolve host; host=https; error=generic:99
E20240522 17:17:36.379698     1 server_family.cc:816] Failed to load snapshot: Failed list objects in S3 bucket: 

coupled the following when trying to SAVE:

E20240522 17:01:49.042889     9 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: 

Same environment as OP.

Thanks,

  • 8x4

8times4 avatar May 22 '24 17:05 8times4

@8times4 what are the command-line flags you used to run dragonfly?

romange avatar May 22 '24 17:05 romange

@8times4 what are the command-line flags you used to run dragonfly?

just the default one by the operator with --dir s3://dflydb-prod and --s3_endpoint=https://<account_id>.r2.cloudflarestorage.com

Here's also a docker command to replicate w/o k8s (needs a cf account):

docker run --rm -p 6379:6379 -e AWS_ACCESS_KEY_ID=<access_key> -e AWS_SECRET_ACCESS_KEY=<secret_key> -e AWS_REGION=us-east-1 --ulimit memlock=-1 docker.dragonflydb.io/dragonflydb/dragonfly:1.18.1 --dir s3://dflydb-prod --logtostderr --requirepass=password --s3_endpoint=https://<account_id>.r2.cloudflarestorage.com

8times4 avatar May 22 '24 22:05 8times4

@andydunstall should the endpoint flag be with the https prefix?

romange avatar May 23 '24 06:05 romange

@andydunstall should the endpoint flag be with the https prefix?

Yep, you can configure http/https using --s3_use_https

andydunstall avatar May 24 '24 05:05 andydunstall

To be clear the --s3-endpoint should be without the scheme prefix as that is provided by the --s3_use_https flag.

This is the problem @8times4 is hitting, but not the problem I'm having as detailed in the first post.

nyxi avatar May 28 '24 07:05 nyxi

@nyxi DSKY_s3_endpoint is a typo in the issue or in your real configuration? it should be DFLY_

romange avatar May 28 '24 08:05 romange

@nyxi DSKY_s3_endpoint is a typo in the issue or in your real configuration? it should be DFLY_

Typo, sorry for the confusion. Updated the first post.

nyxi avatar May 28 '24 08:05 nyxi

I'm having a similar issue while using Backblaze's S3 compatible API.

The difference is that I am instead receiving InvalidArgument when trying to save.

I20240703 12:49:26.809077     1 init.cc:78] dragonfly running in opt mode.
I20240703 12:49:26.809190     1 dfly_main.cc:646] Starting dragonfly df-v1.19.2-2ff628203925b206c4a1031aa24916523dc5382e
I20240703 12:49:26.809424     1 dfly_main.cc:690] maxmemory has not been specified. Deciding myself....
I20240703 12:49:26.809434     1 dfly_main.cc:699] Found 3.13GiB available memory. Setting maxmemory to 2.50GiB
W20240703 12:49:26.809468     1 dfly_main.cc:373] Weird error 1 switching to epoll
I20240703 12:49:26.887785     1 proactor_pool.cc:147] Running 3 io threads
I20240703 12:49:26.890972     1 server_family.cc:721] Host OS: Linux 6.8.0-31-generic x86_64 with 3 threads
I20240703 12:49:26.902956     9 snapshot_storage.cc:185] Creating AWS S3 client; region=eu-central-003; https=true; endpoint=s3.eu-central-003.backblazeb2.com
I20240703 12:49:26.903054     9 credentials_provider_chain.cc:28] aws: disabled EC2 metadata
I20240703 12:49:26.907462     9 credentials_provider_chain.cc:36] aws: loaded credentials; provider=environment
I20240703 12:49:26.921793    10 snapshot_storage.cc:242] Load snapshot: Searching for snapshot in S3 path: s3://dg-df-backups/
W20240703 12:49:26.965176     1 server_family.cc:814] Load snapshot: No snapshot found
I20240703 12:49:26.979427     9 listener_interface.cc:101] sock[9] AcceptServer - listening on port 6379
E20240703 12:49:58.401154    10 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InvalidArgument
E20240703 12:49:58.438844     8 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InvalidArgument
E20240703 12:49:58.439571     9 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InvalidArgument
E20240703 12:49:58.439711    10 s3_write_file.cc:137] aws: s3 write file: failed to create multipart upload: InvalidArgument

Update:

I've been doing more digging and according to Backblaze's docs there are certain limitations with their S3 implementation:

Requests that include the following checksum HTTP headers are rejected with a 400 Bad Request response: x-amz-checksum-crc32, x-amz-checksum-crc32c, x-amz-checksum-sha1, x-amz-checksum-sha256, x-amz-checksum-algorithm, x-amz-checksum-mode

  • https://www.backblaze.com/docs/cloud-storage-s3-compatible-api#unsupported-features

Fair enough, I'll just set s3_sign_payload to false and it should be fixed, I think to myself.. nope. It's still adding the following headers when set to false and I'm assuming that's why Backblaze is not working.

image

I'll send them a question asking if that's indeed the case why it's not working, I'll update later when I hear back.

Update 2:

Currently, checksum headers in the HTTP request are unsupported, which is why it is being rejected. We can absolutely make a feature recommendation on your behalf in order to improve our compatibility with the S3 API.

pepzwee avatar Jul 03 '24 13:07 pepzwee