cpr icon indicating copy to clipboard operation
cpr copied to clipboard

File Upload to S3 via Presigned URL

Open pwolfe1 opened this issue 3 years ago • 11 comments

I am trying to write a function to upload a file to Amazon S3 using a presigned URL. I have written two implementations to do so, one using CPR and one using libcurl. The libcurl one is based on this file_upload.c example from curl's documentation.

CPR:

bool uploadToPresignedURL(std::string url, std::filesystem::path filePath) 
{

    auto r = cpr::Put(cpr::Url{url.c_str()},
                      cpr::Multipart{{"file", cpr::File{filePath.c_str()}}});
    printf("Status code: %lu\n", r.status_code);

    return r.status_code == 200;

}

libcurl:

bool uploadToPresignedURL_libcurl(std::string url, std::filesystem::path filePath) 
{

    CURL *curl;
    CURLcode res;
    struct stat file_info;
    curl_off_t speed_upload, total_time;
    FILE *fd;
    bool uploaded = false;

    fd = fopen(filePath.c_str(), "rb"); /* open file to upload */
    if (!fd)
        return uploaded; /* cannot continue */

    /* to get the file size */
    if (fstat(fileno(fd), &file_info) != 0)
        return uploaded; /* cannot continue */

    curl = curl_easy_init();
    if (curl)
    {
        /* upload to this place */
        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());

        /* tell it to "upload" to the URL */
        curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);

        /* set where to read from (on Windows you need to use READFUNCTION too) */
        curl_easy_setopt(curl, CURLOPT_READDATA, fd);

        /* and give the size of the upload (optional) */
        curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)file_info.st_size);

        /* enable verbose for easier tracing */
        curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);

        res = curl_easy_perform(curl);
        /* Check for errors */
        if (res != CURLE_OK)
        {
            fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
        }
        else
        {
            uploaded = true;
            /* now extract transfer info */
            curl_easy_getinfo(curl, CURLINFO_SPEED_UPLOAD_T, &speed_upload);
            curl_easy_getinfo(curl, CURLINFO_TOTAL_TIME_T, &total_time);

            fprintf(
                stderr,
                "Speed: %lu bytes/sec during %lu.%06lu seconds\n",
                (unsigned long)speed_upload,
                (unsigned long)(total_time / 1000000),
                (unsigned long)(total_time % 1000000));
        }
        /* always cleanup */
        curl_easy_cleanup(curl);
    }
    fclose(fd);

    return uploaded;

}

The libcurl implementation uploads the file properly with a 200. However, with the CPR implementation I am consistently getting a 403. The AWS keys are encoded in the URL in the Presigned URL. Is there something special I need to do to get the Url constructor to deal with them properly? When using the curl CLI, I need to surround the URL with 's or it also 403s.

pwolfe1 avatar Oct 25 '22 21:10 pwolfe1

I've never worked with aws S3 buckets before, but based on the documentation provided here you need to set additional headers like the Content-MD5 header for your request.

I don't know why your libcurl request works. Based on the aws s3 docs it should not :D

Do I use the correct aws s3 docs here? If no, please point me to the correct docs and I will try to transform them to a valid cpr request for you.

COM8 avatar Oct 26 '22 06:10 COM8

Found an other example for uploading to S3 via libcurl: https://gist.github.com/tuxfight3r/7ccbd5abc4ded37ecdbc8fa46966b7e8

COM8 avatar Oct 26 '22 06:10 COM8

I think those documents pertain to raw S3 uploads and not via Presigned URLs. With presigned URLs clients can request a URL for uploading to a specific location that has the necessary auth encoded into it (with an expiration). Because the client is just dealing with a URL it should end up being a standard PUT upload.

Here is another example using the regular curl cli: https://stackoverflow.com/a/9085141. It has the same minimal headers as the libcurl example I've shared and uploads fine. So I am not convinced this is a header issue right now.

I think at the moment I have 2 primary questions that I hope will let us resolve this issue.

  • Is there anything that stands out in the CPR example that would cause the request it's making to differ from the libcurl and curl CLI examples?
  • Is there a way to turn on verbose logging within CPR? Even if it just increases the internal libcurl log level. I am currently thinking there may be something going on with how the auth in the URL is encoded that is causing one of the creds to be malformed when it gets back to AWS (which would naturally lead to a 403). As I mentioned the curl CLI request had to have the URL enclosed in quotes for it to parse properly.

I'm going to try to look through the CPR source code this path uses and will report back if I find anything that may be connected to the issue.

Thank you for the prompt response! CPR has been a joy to use so far and I look forward to better understanding it.

pwolfe1 avatar Oct 26 '22 13:10 pwolfe1

Aha! Than your example makes sense. I think this is might be related to #590. I was never able to reproduce #590, but I have seen some flaky behaviour when uploading files. This might be related to cpr::Multipart but this needs further investigation.

Yes, you can pass a cpr::Verbose{true} object when performing a cpr::Put to get verbose curl logging.

COM8 avatar Oct 27 '22 06:10 COM8

Alright, so by turning on verbose logging I can see a difference between the libcurl and CPR reqeusts:

libcurl:

*   Trying 52.217.168.73:443...
* Connected to <subdomain>.s3.amazonaws.com (52.217.168.73) port 443 (#0)
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=*.s3.amazonaws.com
*  start date: Sep 21 00:00:00 2022 GMT
*  expire date: Aug 26 23:59:59 2023 GMT
*  subjectAltName: host "<subdomain>.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
> PUT /test.mp4?AWSAccessKeyId=<key>&Signature=<sig>%3D&Expires=1666886291 HTTP/1.1
Host: <subdomain>.s3.amazonaws.com
Accept: */*
Content-Length: 449508264
Expect: 100-continue

* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue

CPR:

*   Trying 52.217.83.124:443...
* Connected to <subdomain>.s3.amazonaws.com (52.217.83.124) port 443 (#0)
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=*.s3.amazonaws.com
*  start date: Sep 21 00:00:00 2022 GMT
*  expire date: Aug 26 23:59:59 2023 GMT
*  subjectAltName: host "<subdomain>.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
> PUT /test.mp4?AWSAccessKeyId=<access key>&Signature=<sig>%3D&Expires=1666886381 HTTP/1.1
Host: <subdomain>.s3.amazonaws.com
User-Agent: curl/7.80.0
Accept: */*
Accept-Encoding: deflate, gzip
Content-Length: 449508464
Content-Type: multipart/form-data; boundary=------------------------44cc6b2d0419b217
Expect: 100-continue

* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< x-amz-request-id: 004JCMDG4FFF18J4
< x-amz-id-2: dtuk3FXMgFtTS3+O5R7l3QJC4Wk8i800PzhPuiLnZj07QiKXlyW+b5DyedH3iN+31KF8xSMEuDc=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Thu, 27 Oct 2022 14:59:41 GMT
< Server: AmazonS3
< Connection: close
<

I am wondering if the extra headers in the CPR request, particularly Content-Type: multipart/form-data; boundary=------------------------44cc6b2d0419b217 might be causing an issue. Is there a way to do a PUT file upload that is not multipart?

This comment from that thread has the same speculation. I am going to try what they did and see if that produces a different result.

pwolfe1 avatar Oct 27 '22 15:10 pwolfe1

Based on this comment, I tried the following request

    auto r = cpr::Put(cpr::Url{url.c_str()}, cpr::Verbose{true}, cpr::Body(data.str().c_str(), size), cpr::Header{{"content-type","application/octet-stream"}});

and got the following result:

*   Trying 54.231.172.201:443...
* Connected to <subdomain>.s3.amazonaws.com (54.231.172.201) port 443 (#0)
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=*.s3.amazonaws.com
*  start date: Sep 21 00:00:00 2022 GMT
*  expire date: Aug 26 23:59:59 2023 GMT
*  subjectAltName: host "<subdomain>.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
> PUT /test.mp4?AWSAccessKeyId=<key>&Signature=<sig>%3D&Expires=1666891618 HTTP/1.1
Host: <subdomain>.s3.amazonaws.com
User-Agent: curl/7.80.0
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/octet-stream
Content-Length: 449508264
Expect: 100-continue

* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< x-amz-request-id: G3T17640YA1W6B1Y
< x-amz-id-2: x2XWTxkNN+JDnHLBiYmfxe2QNRO5+Gm6AlBp51p/K1/9nlvqNvkmvr11yjdWjqRuwSfp/COfSr4=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Thu, 27 Oct 2022 16:26:58 GMT
< Server: AmazonS3
< Connection: close
<
* Closing connection 0
Status code: 403

So even without that boundary header, it is still 403-ing. I am now wondering if it has to do with Accept-Encoding: deflate, gzip. As far as I can tell that is the main difference remaining between the working libcurl request and the 403-ing CPR request.

pwolfe1 avatar Oct 27 '22 16:10 pwolfe1

You can try to change the accepted encoding using the following option: cpr::AcceptEncoding{{"deflate", "gzip", "zlib"}}

Docs: https://docs.libcpr.org/advanced-usage.html#http-compression

COM8 avatar Oct 28 '22 09:10 COM8

I gave that a try:

CPR:

* Trying 52.217.96.124:443...
* Connected to <subdomain>.s3.amazonaws.com (52.217.96.124) port 443 (#0)
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=*.s3.amazonaws.com
*  start date: Sep 21 00:00:00 2022 GMT
*  expire date: Aug 26 23:59:59 2023 GMT
*  subjectAltName: host "<subdomain>.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
> PUT /test.mp4?AWSAccessKeyId=<key>&Signature=<sig>&Expires=1666980958 HTTP/1.1
Host: <subdomain>.s3.amazonaws.com
User-Agent: curl/7.80.0
Accept: */*
Accept-Encoding: deflate, gzip, zlib
Content-Type: application/octet-stream
Content-Length: 449508264
Expect: 100-continue

* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< x-amz-request-id: 2MT5GD9XZDYZ9BMG
< x-amz-id-2: FBJghlOnm9Lfgfmu/x/o1c4D+gZhEImsmm5jnX3vMcqo1nu6lIHCpmIv5gTIRdaU6HVFrHGbPfo=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Fri, 28 Oct 2022 17:16:00 GMT
< Server: AmazonS3
< Connection: close
<
* Closing connection 0
Status code: 403

libcurl:

*   Trying 52.216.166.35:443...
* Connected to <subdomain>.s3.amazonaws.com (52.216.166.35) port 443 (#0)
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=*.s3.amazonaws.com
*  start date: Sep 21 00:00:00 2022 GMT
*  expire date: Aug 26 23:59:59 2023 GMT
*  subjectAltName: host "<subdomain>.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
> PUT /test.mp4?AWSAccessKeyId=<key>&Signature=<key>&Expires=1666981099 HTTP/1.1
Host: <subdomain>.s3.amazonaws.com
Accept: */*
Content-Length: 449508264
Expect: 100-continue

* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< x-amz-id-2: zCn9ZPJ5dl9TXPFg3ipQ9vBMjNsAFI75C7uxdQyeuRZMaU9YvZz+/hiYawxDOBFr9C+0s8wrn5o=
< x-amz-request-id: NN8TNB4Y1H64FWTV
< Date: Fri, 28 Oct 2022 17:18:20 GMT
< ETag: "8c0c181548069a0ed832b38ff6956dcc"
< Server: AmazonS3
< Content-Length: 0
<
* Connection #0 to host <subdomain>.s3.amazonaws.com left intact
Speed: 2944372 bytes/sec during 152.666891 seconds

I am a bit stumped at this point. There does not seem to be a difference in the curl logs in how the credentials are passed via the URL so I'm having trouble coming up with reasons as to why the two are producing different results (since it seems the credentials are making to it the libcurl code in both cases properly).

Is there a way to entirely remove headers? User-Agent, Content-Type, and Accept-Encoding are present in the CPR request and not in the libcurl one. While I doubt any of these would cause a problem, I am thinking the closer we can get to 1:1 (or what we think is 1:1), the better. If you have other ideas on how we can get the 2 requests to match as closely as possible let me know as well.

pwolfe1 avatar Oct 28 '22 17:10 pwolfe1

Hmmm, the headers "should not" make a difference. For further debugging you can spin up a dummy server using postman. Sorry don't have too much time for looking into this right now.

COM8 avatar Nov 02 '22 12:11 COM8

All good, I have not either. In the short term I am just using the libcurl example which seems to work fine anyway (my preference in the long term would be to use CPR for 100% of HTTP requests in the project). I can give that dummy server idea a try and will post the results once I do. If you think of anything else worth trying let me know. Thanks!

pwolfe1 avatar Nov 02 '22 13:11 pwolfe1

@pwolfe1 I solved this issue by setting such headers: cpr::Header header{ { "Content-Type", "" }, { "Connection", "keep-alive" }, { "Expect", "" } };

iartyukh-sdc avatar Oct 18 '23 11:10 iartyukh-sdc