amazonka icon indicating copy to clipboard operation
amazonka copied to clipboard

S3: Setting endpoint interferes with vhost style URLs

Open akshaymankar opened this issue 2 years ago • 8 comments

All s3 requests call s3vhost after building the request. This moves the bucket name from the first segment in the path to the host name of the request URL. This only works as long as the S3 Service doesn't have the endpoint overridden. I tested this by using setEndpoint, I couldn't find any other ways of overriding this endpoint. When the endpoint is overridden, neither the path nor the hostname have the bucket name.

My usecase is to support S3-compatible storage services like minio.

Additionally, it would be great if there was a way to specify whether I want to use path style URLs or vhost style URLs.

akshaymankar avatar Feb 28 '22 10:02 akshaymankar

Yeah, it's a bit hacky. It would be good if we did something better and supported other S3-compatible object stores.

What does correct behaviour look like here? I'm not familiar with non-S3 stores. Would it be acceptable if we only did the rewriting when the endpoint matched s3.amazonaws.com or s3.#{region}.amazonaws.com?

Also, how does request signing work when you're talking to a non-AWS (non-IAM?) service?

endgame avatar Feb 28 '22 22:02 endgame

What does correct behaviour look like here? I'm not familiar with non-S3 stores. Would it be acceptable if we only did the rewriting when the endpoint matched s3.amazonaws.com or s3.#{region}.amazonaws.com?

Probably not, because I think there are other storages which already work with vhost style, like Google Cloud Storage, Scaleway Object Storage and even Minio (if configured). I think the current behavior is not too bad except that the bucket name shouldn't go missing in the final request made.

Additionally, though, I think it would be nice if this whole feature could be toggled on or off.

Also, how does request signing work when you're talking to a non-AWS (non-IAM?) service?

I don't understand how exactly the whole request signing and verification works. But I know it does work because my first attempt to make this work with minio was to add some hooks into the HTTP Manager to change the URL from vhost style to path style just before executing the requests. But the requests failed with 403 responses complaining about invalid signature.

akshaymankar avatar Mar 02 '22 08:03 akshaymankar

I would love to see better support for non-S3 support stores. Unfortunately, I don't have capacity to research and build something like that right now, and my priority is on fixing the generator.

I am happy to help out where I can, by reviewing and/or merging PRs or providing a sounding board for design.

There's probably enough variation between object store services that the best thing to do is provide some request-rewriting hooks in the Env or something (probably one at the AWSRequest level, and one at the lower http-client level, that users can override if the one in amazonka doesn't do what's required. See #737.

endgame avatar Mar 14 '22 03:03 endgame

Reclassifying as a bug and adding it to 2.0 since it's a regression that affects @basvandijk.

endgame avatar Jul 11 '22 11:07 endgame

Oh, I can chime in on this a bit.

I was just trying to use Amazonka to replace the minio-hs library. I was looking for the robustness of Amazonka's credential discovery and management options.

This took me an afternoon to figure out. I was so confused as to why my bucket wasn't in the request. I eventually tracked it down and a quick search in the issues brought me here.

It seems like some hosts may support vhosts for buckets and others might not. We'll never build a completely exhaustive list so we need a way to configure it. I think that makes the most sense at the service level and not the request level because it's more about the service endpoint than anything else. However, only the request actually knows the information that may need to go there.

What comes to mind is:

  1. A typeclass for requests to surface what part of their request should be used as a vhost, if any, e.g. ToVHost a => a -> Maybe ByteString
  2. Change Service to include _serviceEndpoint :: Maybe ByteString -> Region -> ByteString. Most existing services would just need a const prepended.

I'm going to evaluate some alternatives right now before I commit to writing a PR for something like that. We have some non-AWS deployments that use a Minio server to provide an object store while our cloud uses S3. I'll need to make sure my solution supports both options.

Other than this little issue and just having to spend a lot of time learning all the types, it's a great library! Keep it up!

periodic avatar Aug 25 '22 23:08 periodic

I just had some shower thoughts. Clearly it would have be more complicated that this because the endpoint and the path are sort of a pair. They can pass through unchanged or the VHost rewriting could end up rewriting both. So maybe it's that the request's instance needs a way to take in the service and return the path and the host. That's getting very close to a class that returns the URI instead of just path, query separately.

periodic avatar Aug 26 '22 00:08 periodic

This is the config that tells the generator to wire in vhost rewriting:

https://github.com/brendanhay/amazonka/blob/26466244b5a23b29ba8894ec89d922e6a6835a05/configs/services/s3.json#L7-L36

This is where the rewrite happens:

https://github.com/brendanhay/amazonka/blob/26466244b5a23b29ba8894ec89d922e6a6835a05/lib/amazonka-core/src/Amazonka/Request.hs#L169-L194

I'm not sure ToVHost is right, because the service value needs to know whether or not it can do vhost rewriting. And then the s3vhost function can check that before rewriting the request?

@brendanhay if you've got time, it would be great to have you weigh in.

endgame avatar Aug 27 '22 11:08 endgame

#797 also has other discussion, including a possible S3VHost setting to go in the Service record.

endgame avatar Aug 27 '22 12:08 endgame