vector
vector copied to clipboard
S3/SQS source events fails sporadically
Problem
I have configured s3 source and getting following error from time to time, once in a ~30-60 min
ERROR source{component_kind="source" component_id=s3 component_type=aws_s3}: vector::internal_events::aws_sqs::s3: Failed to process SQS message. message_id=8983e08a-46b8-48b8-a7f8-b06120861783 error=Failed to fetch s3://xxx/2024/04/16/1713311288-2e57bf4b-6612-4ab8-960e-517f01126830.log.gz: dispatch failure error_code="failed_processing_sqs_message" error_type="parser_failed" stage="processing" internal_log_rate_limit=true
however after being in-flight 10 minutes or so, these messages with corresponding S3 objects seem to be reprocessed successfully from SQS
Configuration
Below are s3 source config:
sources:
s3:
type: aws_s3
compression: gzip
region: eu-west-1
sqs:
queue_url: https://sqs.eu-west-1.amazonaws.com/xxx/xxx-xxx
delete_message: true
client_concurrency: 5
also I have logLevel: aws_smithy_http=info,vector=info
, however along with above-mentioned error message and various vector: Beep
- there are no other informative error messages which could explain root cause of issue
Version
0.37.1
Hmm, yeah, that is odd. I think that sort of error would be coming from the AWS SDK (similar issue, but not for S3: https://github.com/awslabs/aws-sdk-rust/issues/844). You could try increasing the log level of aws_smithy_http
to, say, debug
or trace
but I realize it'd be pretty chatty and so might not be feasible.
same here, is anyone resolve this?
2024-05-04T17:07:15.594521Z INFO vector::topology::running: Running healthchecks. 2024-05-04T17:07:15.595648Z INFO vector: Vector has started. debug="false" version="0.37.1" arch="x86_64" revision="cb6635a 2024-04-09 13:45:06.561412437" 2024-05-04T17:07:15.595990Z INFO vector::app: API is disabled, enable by setting
api.enabledto
trueand use commands like
vector top. 2024-05-04T17:07:15.609041Z INFO source{component_kind="source" component_id=my_source_id component_type=aws_s3}:lazy_load_identity: aws_smithy_runtime::client::identity::cache::lazy: identity cache miss occurred; added new identity (took 185µs) new_expiration=2024-05-04T17:22:15.607636Z valid_for=899.998602s partition=IdentityCachePartition(2) 2024-05-04T17:07:15.826752Z INFO vector::topology::builder: Healthcheck passed. 2024-05-04T17:07:15.860011Z ERROR source{component_kind="source" component_id=my_source_id component_type=aws_s3}: vector::internal_events::aws_sqs: Failed to fetch SQS events. error=service error error_code="failed_fetching_sqs_events" error_type="request_failed" stage="receiving" internal_log_rate_limit=true 2024-05-04T17:07:15.907364Z ERROR source{component_kind="source" component_id=my_source_id component_type=aws_s3}: vector::internal_events::aws_sqs: Internal log [Failed to fetch SQS events.] is being suppressed to avoid flooding.
Also happens with S3 sink:
2024-05-22T09:40:24.026717Z WARN sink{component_kind="sink" component_id=out_kafka_access_s3 component_type=aws_s3}:request{request_id=23796}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
I think all of these is related about how AWS sdk handle retries and failures for AWS IAM.
https://github.com/vectordotdev/vector/issues/20266
Hmm, yeah, that is odd. I think that sort of error would be coming from the AWS SDK (similar issue, but not for S3: https://github.com/awslabs/aws-sdk-rust/issues/844). You could try increasing the log level of
aws_smithy_http
to, say,debug
ortrace
but I realize it'd be pretty chatty and so might not be feasible.
I've tried, but messages weren't more verbose in this specific part even with aws_smithy_http=trace
Hello, I'm seeing similar with S3 source and SQS, happens every 6 hours or so most of the time, sometimes there is a 12 hour gap between error. Vector is on an EC2 instance using an instance profile, and the queue and bucket are in a different account to this EC2 instance. At the bottom are some debug logs when the error occurs, previous errors when debug logging wasn't enabled had different content, like:
error=Failed to fetch s3://<object url>: dispatch failure error_code=\"failed_processing_sqs_message\" error_type=\"parser_failed\" stage=\"processing\"
aws_sqs: Failed to fetch SQS events. error=service error error_code=\"failed_fetching_sqs_events\" error_type=\"request_failed\" stage=\"receiving\"
When this error happens the queue metrics show messages living on the queue for 200+ seconds, but they will eventually get consumed (as least I think that is what is happening, but maybe they are being dropped and events lost - but I do tell vector to not delete failed messages). The source is configured like this:
type: aws_s3
region: eu-central-1
sqs:
queue_url: "{{ queue_url }}"
delete_failed_message: false
Any advice on config that could alleviate this issue please let me now, or any impending release that aims to fix it. If this continues I'll have to switch to static creds and see how things look using them.
Debug logs:
"2024-10-22T10:48:48.918729Z DEBUG hyper::client::connect::dns: resolving host=""sqs.eu-central-1.amazonaws.com""
"2024-10-22T10:48:48.918694Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: applying minimum upload throughput check future options=MinimumThroughputBodyOptions { minimum_throughput: Throughput { bytes_read: 1, per_time_elapsed: 1s }, grace_period: 5s, check_window: 1s }"
"2024-10-22T10:48:48.918682Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op:try_attempt: aws_smithy_runtime_api::client::interceptors::context: entering 'transmit' phase"
"2024-10-22T10:48:48.918620Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op:try_attempt: aws_smithy_runtime::client::identity::cache::lazy: loaded identity from cache buffer_time=10s cached_expiration=Some(SystemTime { tv_sec: 1729594138, tv_nsec: 0 }) now=SystemTime { tv_sec: 1729594128, tv_nsec: 918617969 }"
"2024-10-22T10:48:48.918602Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op:try_attempt: aws_smithy_runtime::client::orchestrator::endpoints: will use endpoint Endpoint { url: ""https://sqs.eu-central-1.amazonaws.com"", headers: {}, properties: {} }"
"2024-10-22T10:48:48.918595Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op:try_attempt: aws_smithy_runtime::client::orchestrator::endpoints: resolving endpoint endpoint_params=EndpointResolverParams(TypeErasedBox[!Clone]:Params { region: Some(""eu-central-1""), use_dual_stack: false, use_fips: false, endpoint: None }) endpoint_prefix=None"
"2024-10-22T10:48:48.918589Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op: aws_smithy_runtime::client::orchestrator: beginning attempt #1"
"2024-10-22T10:48:48.918584Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op: aws_smithy_runtime::client::orchestrator: retry strategy has OKed initial request"
"2024-10-22T10:48:48.918580Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op: aws_smithy_runtime::client::retries::strategy::standard: no client rate limiter configured, so no token is required for the initial request."
"2024-10-22T10:48:48.918573Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op: aws_smithy_runtime_api::client::interceptors::context: entering 'before transmit' phase"
"2024-10-22T10:48:48.918562Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:try_op: aws_smithy_runtime_api::client::interceptors::context: entering 'serialization' phase"
"2024-10-22T10:48:48.918551Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8349334}:apply_configuration: aws_smithy_runtime::client::orchestrator: timeout settings for this operation: TimeoutConfig { connect_timeout: Disabled, read_timeout: Disabled, operation_timeout: Disabled, operation_attempt_timeout: Unset }"
"2024-10-22T10:48:48.918519Z ERROR source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}: vector::internal_events::aws_sqs: Failed to fetch SQS events. error=service error error_code=""failed_fetching_sqs_events"" error_type=""request_failed"" stage=""receiving"" internal_log_rate_limit=true"
"2024-10-22T10:48:48.918515Z ERROR source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}: vector::internal_events::aws_sqs: Internal log [Failed to fetch SQS events.] has been suppressed 414 times."
"2024-10-22T10:48:48.918486Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8819242}:try_op: aws_smithy_runtime::client::orchestrator: a retry is either unnecessary or not possible, exiting attempt loop"
"2024-10-22T10:48:48.918481Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8819242}:try_op: aws_smithy_runtime::client::retries::strategy::standard: not retrying because we are out of attempts attempts=1 max_attempts=1"
"2024-10-22T10:48:48.918468Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8819242}:try_op:try_attempt: aws_smithy_runtime_api::client::interceptors::context: entering 'after deserialization' phase"
"2024-10-22T10:48:48.918452Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8819242}:try_op:try_attempt:deserialization:deserialize_nonstreaming: aws_sdk_sqs::operation::receive_message: request_id=Some(""2293170b-6dda-521e-8f04-489ae881545b"")"
"2024-10-22T10:48:48.918435Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8819242}:try_op:try_attempt: aws_smithy_runtime_api::client::interceptors::context: entering 'deserialization' phase"
"2024-10-22T10:48:48.918414Z DEBUG source{component_kind=""source"" component_id=filters_s3_source component_type=aws_s3}:invoke{service=sqs operation=ReceiveMessage sdk_invocation_id=8819242}:try_op:try_attempt: aws_smithy_runtime_api::client::interceptors::context: entering 'before deserialization' phase"
"2024-10-22T10:48:48.918287Z DEBUG hyper::proto::h1::conn: incoming body completed"
"2024-10-22T10:48:48.918260Z DEBUG hyper::proto::h1::conn: incoming body is content-length (125 bytes)"
"2024-10-22T10:48:48.918251Z DEBUG hyper::proto::h1::io: parsed 6 headers"
"2024-10-22T10:48:48.913970Z DEBUG hyper::proto::h1::io: flushed 2293 bytes"