vector icon indicating copy to clipboard operation
vector copied to clipboard

enhancement(aws provider): Add ability to disable request signing

Open jszwedko opened this issue 1 year ago • 4 comments

This can be useful when sending anonymous requests to AWS S3. Potentially it could be useful in other situations as well (e.g. sending to AWS API compatible endpoints that don't support signing).

I'd really have liked to introduce this as a new "strategy" for AWS authentication configuration since it is mutually exclusive with the others, but given the way AWS authentication configuration is currently implemented as an untagged enum, adding it to the default config enum seemed like the best option.

If/when we refactor this to follow https://github.com/vectordotdev/vector/blob/master/docs/specs/configuration.md#polymorphism then we can move it.

Prompted by a user in discord: https://discord.com/channels/742820443487993987/1267892632319692802/1267892632319692802

jszwedko avatar Jul 31 '24 01:07 jszwedko

Datadog Report

Branch report: jszwedko/add-aws-none-option Commit report: 949d275 Test service: vector

:white_check_mark: 0 Failed, 443 Passed, 0 Skipped, 4m 5.35s Total Time

Just thought of adding this extra information here in case it helps anyone / for anything .With respect to anonymous access to S3 (via S3 sink) for whose use case this unsigned request feature can be used, AWS does not allow multipart upload (for reason only known to AWS ) . S3 supports file uploads in chunks of 5 MB., and hence, only one 5 MB file can be uploaded in one go at max with unsigned request. There is no official document with this regards. Post my discussion in discord (link mentioned in issue description) , I was experimenting with batch uploads of files using AWS cli with anonymous access instead of relying on vector until this feature gets into main branch. And I just happened to stumble on this error. The closest information I got was the one given in the below link :-

https://github.com/aws/aws-sdk-js/issues/512#issuecomment-77425261

rams3sh avatar Aug 03 '24 09:08 rams3sh

Hey @jszwedko

Even before its merged to main branch , I have started testing it 😝 .So kindly excuse my impatience.

For some reason, the logs are not getting delivered to my S3 bucket with the made changes.

I have detailed my experiment below. Let me know if it's a bug or if I am going wrong somewhere in my steps.

Pr-requisites

The target <MY_BUCKET> is attached with the below policy :-

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SourceIPBasedLogUploadRestriction",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:PutObject",
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<MY_BUCKET>",
                "arn:aws:s3:::<MY_BUCKET>/<KEY_PREFIX>/*"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": [
                        "<MY_RESTRICTED_IP>/32"
                    ]
                }
            }
        }
    ]
}

And the instance from where vector is being run has <MY_RESTRICTED_IP>.

Steps to reproduce the issue

Step -1 Building from source

apt-get update && apt-get install git curl make libsasl2-dev protobuf-compiler -y && \
    git clone -b "jszwedko/add-aws-none-option" https://github.com/vectordotdev/vector.git && \
    cd vector && \
    make build

Step -2 Drafting a dummy log generator and Anonymous access based S3 sink vector config

data_dir: "/var/lib/vector"
api:
  enabled: false
sources:
  wp_logs:
    type: "demo_logs"
    format: "json"
transforms:
  wp_logs_transformer:
    type: remap
    inputs:
      - wp_logs
    source: |
       . = parse_json!(.message)
sinks:
  wp_logs_s3_sink:
    inputs:
      - "wp_logs_transformer"
    auth:
      sign: false  # Mark siging to false
    type: "aws_s3"
    region: "us-east-1"
    bucket: "<MY_BUCKET>"
    key_prefix: "<KEY_PREFIX>/vector/wp/"
    compression: "gzip"
    buffer:
      type: disk # Store the buffer on disk
      max_size: 268435488 # Maximum size for the buffer is 256 MB
    batch:
      timeout_secs: 300 # Sync at least once every 5 mins
      max_bytes: 4900000 # Sync at least once the buffer size is at 4.9 MB (anonymous request can upload only 5 MB at max at one go, hence this number)
    encoding:
      codec: json

Step 3: Set the trace mode for checking out trace log entries for debugging

export VECTOR_LOG=trace
export RUST_LOG=trace

Step 4: Running vector

vector -c <above_drafted_config>.yaml

When vector is run I get (not some clear) errors which looks like below :-

  1. First one , topology healthcheck failure
2024-08-06T10:36:46.347475Z ERROR vector::topology::builder: msg="Healthcheck failed." error=dispatch failure component_kind="sink" component_type="aws_s3" component_id=wp_logs_s3_sink
  1. Logs related to uploading of objects to S3. (Not clearly giving out any failure, but something related to retrying is being printed. )
2024-08-06T10:20:06.370350Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime_api::client::interceptors::context: entering 'transmit' phase
2024-08-06T10:20:06.370380Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::orchestrator: transmitting request request=Request { body: SdkBody { inner: BoxBody, retryable: true }, uri: Uri { as_string: "https://s3.us-east-1.amazonaws.com/<MY_BUCKET>/<KEY_PREFIX>/vector/wp/1722939340-f7f62251-d64e-46cd-91f0-fe0ab501f2bc.log.gz?x-id=PutObject", parsed: H0(https://s3.us-east-1.amazonaws.com/<MY_BUCKET>/<KEY_PREFIX>/vector/wp/1722939340-f7f62251-d64e-46cd-91f0-fe0ab501f2bc.log.gz?x-id=PutObject) }, method: PUT, extensions: Extensions { extensions_02x: Extensions, extensions_1x: Extensions }, headers: Headers { headers: {"content-encoding": HeaderValue { _private: H0("gzip") }, "content-md5": HeaderValue { _private: H0("pzHGODDnKfkj+o3UdNivZg==") }, "content-type": HeaderValue { _private: H0("text/x-log") }, "x-amz-storage-class": HeaderValue { _private: H0("STANDARD") }, "content-length": HeaderValue { _private: H0("9698") }, "user-agent": HeaderValue { _private: H0("aws-sdk-rust/1.3.3 os/linux lang/rust/1.79.0") }, "x-amz-user-agent": HeaderValue { _private: H0("aws-sdk-rust/1.3.3 api/s3/1.4.0 os/linux lang/rust/1.79.0") }, "amz-sdk-request": HeaderValue { _private: H0("attempt=1; max=1") }, "amz-sdk-invocation-id": HeaderValue { _private: H0("52344858-f818-4b28-8f7f-08bc1ca4a2a4") }} } }
2024-08-06T10:20:06.370488Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: applying minimum upload throughput check future options=MinimumThroughputBodyOptions { minimum_throughput: Throughput { bytes_read: 1, per_time_elapsed: 1s }, grace_period: 5s, check_window: 1s }
2024-08-06T10:20:06.370541Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::pool: checkout waiting for idle connection: ("https", s3.us-east-1.amazonaws.com)
2024-08-06T10:20:06.370639Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::connect::http: Http::connect; scheme=Some("https"), host=Some("s3.us-east-1.amazonaws.com"), port=None
2024-08-06T10:20:06.370753Z DEBUG hyper::client::connect::dns: resolving host="s3.us-east-1.amazonaws.com"
2024-08-06T10:20:06.383200Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::connect::http: connecting to 52.216.145.221:443
2024-08-06T10:20:06.472344Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.573302Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.592039Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::connect::http: connected to 52.216.145.221:443
2024-08-06T10:20:06.674614Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.776308Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.877325Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.978343Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated

I unset the trace for both RUST_LOG and VECTOR_LOG so that noise can be filtered out and I got below WARN level logs.

2024-08-06T10:41:46.391810Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:41:47.791739Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
2024-08-06T10:41:59.340587Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been suppressed 5 times.
2024-08-06T10:41:59.340644Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:02.638699Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
2024-08-06T10:42:20.896324Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been suppressed 1 times.
2024-08-06T10:42:20.896378Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:32.296972Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:50.259249Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:56.355776Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
2024-08-06T10:43:11.253607Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been suppressed 1 times.
2024-08-06T10:43:11.253661Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true

Further, I tested with aws cli with sample file just to make sure if my permissions were correct. The sample file got uploaded successfully. The command used for it is given below :-

aws s3 cp <SAMPLE_FILE> s3://<MY_BUCKET>/<KEY_PREFIX>/ --no-sign-request

Thought of putting these now here, so that if any issue exists, it can be corrected in the PR.

rams3sh avatar Aug 06 '24 10:08 rams3sh

@rams3sh Thanks for trying this out proactively! That's unfortunate to hear it doesn't seem to be working for you. I'll try to reproduce and see if I can figure it out this upcoming week.

jszwedko avatar Aug 09 '24 17:08 jszwedko

Hi @jszwedko is this still a valid PR or can it be closed? I see it is quite stale, thanks !

git-thuerk-done avatar Nov 07 '24 19:11 git-thuerk-done

Hi @jszwedko is this still a valid PR or can it be closed? I see it is quite stale, thanks !

I'd still like to circle back to this 😓 If anyone else is motivated, though, feel free to open a new PR and I can close this one.

jszwedko avatar Nov 07 '24 19:11 jszwedko

Hello, shall we make a ticket to track this enhancement and close the PR?

pront avatar Jul 24 '25 19:07 pront