amazonka icon indicating copy to clipboard operation
amazonka copied to clipboard

A repro for issue #796

Open basvandijk opened this issue 2 years ago • 10 comments

This adds a reproducible test case for issue #796 "PutObject request failing on MinIO because of malformed XML" by adding an integration test checking amazonka-s3 against MinIO. To execute the test just run the following:

$ nix-build tests/s3.nix
...
s3: must succeed:
    echo 'Hello World!' > ./some-file
    export AWS_ACCESS_KEY_ID="BKIKJAA5BMMU2RHO6IBB"
    export AWS_SECRET_ACCESS_KEY="V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12"
    amazonka-s3-test-app ./some-file http://s3:9000 my-bucket some-file

s3 # > > > > > [   11.524753] minio[818]: {
  "deploymentid": "0de4a2b3-eaf1-46a0-8ec7-2e706e68d3b4",
  "level": "ERROR",
  "errKind": "MINIO",
  "time": "2022-07-09T18:07:28.335404358Z",
  "api": {
    "name": "SYSTEM",
    "args": {}
  },
  "error": {
    "message": "EOF (*errors.errorString)",
    "source": [
      "cmd/handler-utils.go:55:cmd.parseLocationConstraint()",
      "cmd/auth-handler.go:344:cmd.checkRequestAuthTypeCredential()",
      "cmd/auth-handler.go:296:cmd.checkRequestAuthType()",
      "cmd/bucket-handlers.go:719:cmd.objectAPIHandlers.PutBucketHandler()",
      "net/http/server.go:2047:http.HandlerFunc.ServeHTTP()"
    ]
  }
}
s3 # amazonka-s3-test-app:   ServiceError
    ( ServiceError'
        { _serviceErrorAbbrev = Abbrev {fromAbbrev = "S3"},
          _serviceErrorStatus = Status {statusCode = 400, statusMessage = "Bad Request"},
          _serviceErrorHeaders =
            [ ("Accept-Ranges", "bytes"),
              ("Content-Length", "371"),
              ("Content-Security-Policy", "block-all-mixed-content"),
              ("Content-Type", "application/xml"),
              ("Server", "MinIO"),
              ("Strict-Transport-Security", "max-age=31536000; includeSubDomains"),
              ("Vary", "Origin"),
              ("Vary", "Accept-Encoding"),
              ("X-Amz-Bucket-Region", "us-east-1"),
              ("X-Amz-Request-Id", "17003B77BD220CF2"),
              ("X-Content-Type-Options", "nosniff"),
              ("X-Xss-Protection", "1; mode=block"),
              ("Date", "Sat, 09 Jul 2022 18:07:28 GMT")
            ],
          _serviceErrorCode = ErrorCode "MalformedXML",
          _serviceErrorMessage =
            Just
              ( ErrorMessage
                  { fromErrorMessage = "The XML you provided was not well-formed or did not validate against our published schema."
                  }
              ),
          _serviceErrorRequestId =
            Just
              ( RequestId
                  { fromRequestId = "17003B77BD220CF2"
                  }
              )
        }
    )

basvandijk avatar Jul 09 '22 18:07 basvandijk

That ^ commit extends the test with capturing the network packets using tcpdump to inspect the request amazonka generates. This is the dump I get:

10:20:48.571111 IP6 ::1.59144 > ::1.9000: Flags [S], seq 4201602683, win 65476, options [mss 65476,sackOK,TS val 2587980450 ecr 0,nop,wscale 7], length 0
`.b?.(.@..................................#(.o^{.........0.........
.Ar.........
10:20:48.571164 IP6 ::1.9000 > ::1.59144: Flags [S.], seq 1752750534, ack 4201602684, win 65464, options [mss 65476,sackOK,TS val 2587980450 ecr 2587980450,nop,wscale 7], length 0
`....(.@................................#(..hx...o^|.....0.........
.Ar..Ar.....
10:20:48.571226 IP6 ::1.59144 > ::1.9000: Flags [.], ack 1, win 512, options [nop,nop,TS val 2587980450 ecr 2587980450], length 0
`.b?. .@..................................#(.o^|hx.......(.....
.Ar..Ar.
10:20:48.572635 IP6 ::1.59144 > ::1.9000: Flags [P.], seq 1:461, ack 1, win 512, options [nop,nop,TS val 2587980451 ecr 2587980450], length 460
`.b?...@..................................#(.o^|hx.............
.Ar..Ar.PUT /some-file HTTP/1.1
Accept-Encoding: gzip
Content-Length: 13
Host: s3:9000
X-Amz-Date: 20220710T102048Z
X-Amz-Content-SHA256: 03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340
Expect: 100-continue
Authorization: AWS4-HMAC-SHA256 Credential=BKIKJAA5BMMU2RHO6IBB/20220710/us-east-1/s3/aws4_request, SignedHeaders=expect;host;x-amz-content-sha256;x-amz-date, Signature=a8a6fa5ae7a99054f3b00e694b4413c9bee04bbefdd574af84a8987fb7a4807e


10:20:48.572657 IP6 ::1.9000 > ::1.59144: Flags [.], ack 461, win 508, options [nop,nop,TS val 2587980451 ecr 2587980451], length 0
`.... .@................................#(..hx...o`H.....(.....
.Ar..Ar.
10:20:48.573627 IP6 ::1.9000 > ::1.59144: Flags [P.], seq 1:26, ack 461, win 512, options [nop,nop,TS val 2587980452 ecr 2587980451], length 25
`....9.@................................#(..hx...o`H.....A.....
.Ar..Ar.HTTP/1.1 100 Continue


10:20:48.573656 IP6 ::1.59144 > ::1.9000: Flags [.], ack 26, win 512, options [nop,nop,TS val 2587980452 ecr 2587980452], length 0
`.b?. .@..................................#(.o`Hhx.......(.....
.Ar..Ar.
10:20:48.573789 IP6 ::1.59144 > ::1.9000: Flags [P.], seq 461:474, ack 26, win 512, options [nop,nop,TS val 2587980453 ecr 2587980452], length 13
`.b?.-.@..................................#(.o`Hhx.......5.....
.Ar..Ar.Hello World!

10:20:48.573807 IP6 ::1.9000 > ::1.59144: Flags [.], ack 474, win 512, options [nop,nop,TS val 2587980453 ecr 2587980453], length 0
`.... .@................................#(..hx...o`U.....(.....
.Ar..Ar.
10:20:48.581357 IP6 ::1.9000 > ::1.59144: Flags [P.], seq 26:836, ack 474, win 512, options [nop,nop,TS val 2587980460 ecr 2587980453], length 810
`....J.@................................#(..hx...o`U.....R.....
.Ar..Ar.HTTP/1.1 400 Bad Request
Accept-Ranges: bytes
Content-Length: 371
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Bucket-Region: us-east-1
X-Amz-Request-Id: 170070951ABB6B77
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Date: Sun, 10 Jul 2022 10:20:48 GMT

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MalformedXML</Code><Message>The XML you provided was not well-formed or did not validate against our published schema.</Message><BucketName>some-file</BucketName><Resource>/some-file</Resource><Region>us-east-1</Region><RequestId>170070951ABB6B77</RequestId><HostId>b31e5451-e212-4657-9db6-a3d4f5f47922</HostId></Error>
10:20:48.581390 IP6 ::1.59144 > ::1.9000: Flags [.], ack 836, win 506, options [nop,nop,TS val 2587980460 ecr 2587980460], length 0
`.b?. .@..................................#(.o`Uhx.
.....(.....
.Ar..Ar.
10:20:48.605507 IP6 ::1.59144 > ::1.9000: Flags [F.], seq 474, ack 836, win 512, options [nop,nop,TS val 2587980484 ecr 2587980460], length 0
`.b?. .@..................................#(.o`Uhx.
.....(.....
.Ar..Ar.
10:20:48.605588 IP6 ::1.9000 > ::1.59144: Flags [F.], seq 836, ack 475, win 512, options [nop,nop,TS val 2587980484 ecr 2587980484], length 0
`.... .@................................#(..hx.
.o`V.....(.....
.Ar..Ar.
10:20:48.605613 IP6 ::1.59144 > ::1.9000: Flags [.], ack 837, win 512, options [nop,nop,TS val 2587980484 ecr 2587980484], length 0
`.b?. .@..................................#(.o`Vhx.......(.....
.Ar..Ar.

So you can see a packet containing the HTTP request PUT /some-file and later a packet containing the contetns Hello World!.

You can also see the MinIO response The XML you provided was not well-formed . Indeed we didn't provide any XML. We just PUT the contents Hello World! without wrapping it in any XML.

What's at fault here? Amazonka? MinIO? Or am I missing something?

basvandijk avatar Jul 10 '22 10:07 basvandijk

I'm suspicious of the fact that, even though the name of the bucket is specified in the PutObject request:

PutObject'
    { contentLength = Nothing
    , objectLockMode = Nothing
    , expires = Nothing
    , grantReadACP = Nothing
    , sSECustomerAlgorithm = Nothing
    , sSECustomerKey = Nothing
    , requestPayer = Nothing
    , grantWriteACP = Nothing
    , bucketKeyEnabled = Nothing
    , websiteRedirectLocation = Nothing
    , grantRead = Nothing
    , storageClass = Nothing
    , sSECustomerKeyMD5 = Nothing
    , sSEKMSKeyId = Nothing
    , grantFullControl = Nothing
    , contentEncoding = Nothing
    , tagging = Nothing
    , contentMD5 = Nothing
    , objectLockRetainUntilDate = Nothing
    , metadata = fromList []
    , sSEKMSEncryptionContext = Nothing
    , cacheControl = Nothing
    , contentLanguage = Nothing
    , objectLockLegalHoldStatus = Nothing
    , acl = Nothing
    , contentDisposition = Nothing
    , expectedBucketOwner = Nothing
    , serverSideEncryption = Nothing
    , contentType = Nothing
    , bucket = BucketName "my-bucket"
    , key = ObjectKey "some-file"
    , body = Hashed HashedStream
        { sha256 = 03 ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340
        , length = 13
        }
    }

that bucket name is not included in the HTTP request to MinIO:

PUT /some-file HTTP/1.1
Accept-Encoding: gzip
Content-Length: 13
Host: s3:9000
X-Amz-Date: 20220710T165128Z
X-Amz-Content-SHA256: 03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340
Expect: 100-continue
Authorization: AWS4-HMAC-SHA256 Credential=BKIKJAA5BMMU2RHO6IBB/20220710/us-east-1/s3/aws4_request, SignedHeaders=expect;host;x-amz-content-sha256;x-amz-date, Signature=67a8e6a8f495e6bcad4fc8ea7c65180e113f8a0a4c5e44f2565ff365806c1626


Maybe that explains why MinIO is not expecting to receive an object at URL /some-file but is expecting XML instead because that URL is not for uploading objects.

basvandijk avatar Jul 10 '22 16:07 basvandijk

If I look at examples of how to put an object to MinIO I see the request path needs to begin with the bucket name. So in our example we need the request to be:

PUT /my-bucket/some-file 

I'll dive into the amazonka source code to see why the bucket name is omitted.

basvandijk avatar Jul 10 '22 17:07 basvandijk

I found the root cause: the default request of a PutObject contains the right path with bucketName/objectKey. However it's passed through the s3vhost function which strips of the bucket name from the path and moves it to the host.

The right solution is probably to only apply the s3vhost transformation if we're dealing with an AWS endpoint.

basvandijk avatar Jul 10 '22 18:07 basvandijk

@endgame, @akshaymankar https://github.com/basvandijk/amazonka/compare/put-to-minio-malformed-xml...put-to-minio-malformed-xml-fix-attempt-1 tries to fix #760 by adding the field _serviceRewriteS3VHost :: Bool to the Service record which is False for all services except for S3.

A user can then override this field in the environment by setting it to False:

  defEnv <- Amazonka.newEnv Amazonka.Auth.fromKeysEnv
  let env =
        flip Amazonka.override defEnv $
          (Amazonka.serviceEndpoint .~ endpoint)
            . (Amazonka.serviceRewriteS3VHost .~ False)

Then s3vhost will not rewrite the passed request rq if not $ rq ^. requestService . serviceRewriteS3VHost.

This doesn't work unfortunately since the AWSRequest instance for PutObject overrides the Service record in the passed Request with defaultService, thereby setting the _serviceRewriteS3VHost back to True.

Possible solution

To fix this I think the request method implementations should stop setting a defaultService in the Request like we do here. Instead we need to have a type class:

class AWSService a where
  service :: Proxy a -> Service

and have instances for all request types like for example:

instance AWSService PutObject where
  service _proxy = Amazonka.S3.Types.defaultService

then the request method needs to accept a Service argument:

class AWSRequest a where
  ...
  request :: Service -> a -> Request a

The instance for PutObject for example will then look like:

instance Core.AWSRequest PutObject where
  type AWSResponse PutObject = PutObjectResponse
-  request =
+  request srv =
    Request.expectHeader
      Prelude.. Request.s3vhost
-      Prelude.. Request.putBody defaultService
+      Prelude.. Request.putBody srv

configureRequest will then override the service with the user supplied overrides and apply the request method to it:

- configureRequest :: AWSRequest a => Env' withAuth -> a -> Request a
+ configureRequest :: forall a. (AWSRequest a, AWSService a) => Env' withAuth -> a -> Request a
configureRequest env x =
  let overrides = envOverride env
+     srv = appEndo (getDual overrides) $ service (Proxy :: Proxy a)
-    in request x & requestService %~ appEndo (getDual overrides)
+    in request srv x

We could then also consider dropping the _requestService field from Request but I'm not sure if that's used for other things.

basvandijk avatar Jul 12 '22 09:07 basvandijk

This seems overwrought; can you not just set the rewriteS3VHost in an override with the rest of the amazonka Env? The requestService appears to be modified after the default request is generated by the typeclass instance:

https://github.com/brendanhay/amazonka/blob/26466244b5a23b29ba8894ec89d922e6a6835a05/lib/amazonka/src/Amazonka/HTTP.hs#L157-L160

endgame avatar Aug 27 '22 11:08 endgame

... can you not just set the rewriteS3VHost in an override with the rest of the amazonka Env? ...

I tried that:

https://github.com/basvandijk/amazonka/compare/put-to-minio-malformed-xml...put-to-minio-malformed-xml-fix-attempt-1 tries to fix https://github.com/brendanhay/amazonka/issues/760 by adding the field _serviceRewriteS3VHost :: Bool to the Service record which is False for all services except for S3.

But:

This doesn't work unfortunately since the AWSRequest instance for PutObject overrides the Service record in the passed Request with defaultService, thereby setting the _serviceRewriteS3VHost back to True.

Hence my suggestion for that "possible solution".

basvandijk avatar Sep 06 '22 10:09 basvandijk

FYI @basvandijk your fix is now implemented over on our branch here: https://github.com/SupercedeTech/amazonka/commits/temp-vhost-fix

I've tested it for our use case (using Moto as an S3 mock) and it works as expected.

One particular issue of note was this line, which I believe is the cause for #760, in part. Because the service overrides are originally applied after the request is built, the call to setEndpoint resets the service host after the bucket name is prepended in s3vhost - so the URL which is presigned ends up being incorrect, missing the bucket in both the path and hostname.

The fix involving AWSService (or, at minimum, passing the Service in to request) seems necessary to resolve this, since Service overrides alone don't seem to accurately manage it.

ivb-supercede avatar Sep 14 '22 14:09 ivb-supercede

We're also experiencing the issue with s3vhost rewriting, as we use localstack instead of real AWS to run our test suites.

What are the next steps regarding wrt. the patch proposed by @ivb-supercede?

kfigiela avatar Oct 18 '22 15:10 kfigiela