vector icon indicating copy to clipboard operation
vector copied to clipboard

Request Timeouts When Sending Logs to CrowdStrike NGSIEM Connector

Open rgarcio opened this issue 3 months ago • 2 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Hello everyone,

I’m opening this issue to get help with a problem we’re experiencing when using Vector to send firewall logs to CrowdStrike’s NGSIEM connector.

We are intermittently receiving warning logs from Vector indicating request timeouts. These timeouts occur at random intervals and are not tied to any specific pattern or load condition.

Setup Details

Log Source: Firewall logs Transport Method: Syslog container with redundancy Transport Flow: Firewall -> Syslog-ng Container -> Vector Container Destination: CrowdStrike NGSIEM connector Issue: Random request timeouts logged by Vector Impact: No log loss due to redundancy, but the warnings are concerning

Troubleshooting So Far

  • We’ve worked with CrowdStrike support, and they confirmed there are no issues on their end.
  • They recommended opening a case with the Vector team for further investigation.

Additional Notes

  • We are using a syslog container with redundancy, so logs are not being lost.
  • The issue seems to be isolated to Vector’s communication with the NGSIEM endpoint.

Any help or guidance would be greatly appreciated!

Configuration

sources:
  checkpoint:
    type: syslog
    address: 0.0.0.0:9514
    mode: tcp
    permit_origin:
      - <container IP>
## Expose Vector Metric Logs
  drop_metrics:
    type: internal_metrics

## Configure Sinks to NGSIEM
sinks:
  crowdstrike:
    type: humio_logs
    inputs:
      - checkpoint
    endpoint: ${CS_URL}
    token: ${TOKEN}
    encoding:
      codec: raw_message
    tls:
      crt_file: "<certificate>.pem"
      key_file: "<key>.pkcs8.key"
      verify_certificate: true
      verify_hostname: true
    request:
      concurrency: "adaptive"
      rate_limit_duration_secs: 1     # default value
      retry_initial_backoff_secs: 10  # increased interval (default is 1s)
      retry_max_duration_secs: 60     # incresead interval (default is 30s (values I tried 10s,20s,30s))
      timeout_secs: 120               # default value (30s)
    batch:
      max_bytes: 1048576              # matching LogScale shipper value
      max_events: 8388608             # matching LogScale shipper value
      timeout_secs: 1                 # default value
    buffer:
      type: "memory"                  # added for log retention
      max_events: 100000
      when_full: "block"
    compression: "gzip"               # added as testing (slight improvement noticed)
## Send Vector Metric Logs to a Blackhole
  blackhole:
    type: blackhole
    inputs:
      - drop_metrics

Version

vector 0.48.0 (x86_64-unknown-linux-musl a67e4e2 2025-06-30 18:25:45.272082383) -> 0.48.0-alpine

Debug Output

2025-08-28T05:20:28.711114Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12592544}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true
2025-08-28T05:47:44.781778Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12601821}: vector::sinks::util::retries: Retrying after response. reason=502 Bad Gateway: {"meta":{"query_time":0.08215832,"powered_by":"crowdstrike-third-party-gateway","trace_id":"9905b0697b652e8ec2e48d038e8347fb"},"resources":[],"errors":[{"code":502,"message":"bad gateway"}]}
 internal_log_rate_limit=true
2025-08-28T05:47:47.414871Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12601832}: vector::sinks::util::retries: Internal log [Retrying after response.] is being suppressed to avoid flooding.
2025-08-28T07:50:42.406606Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12639926}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true
2025-08-28T09:12:46.786205Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12666380}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true
2025-08-28T11:05:59.918034Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12704498}: vector::sinks::util::retries: Internal log [Retrying after response.] has been suppressed 1 times.
2025-08-28T11:05:59.918052Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12704498}: vector::sinks::util::retries: Retrying after response. reason=502 Bad Gateway: {"meta":{"query_time":0.067887774,"powered_by":"crowdstrike-third-party-gateway","trace_id":"4a130783fa146326ff471fba05bad856"},"resources":[],"errors":[{"code":502,"message":"bad gateway"}]}
 internal_log_rate_limit=true
2025-08-28T11:06:01.766884Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12704508}: vector::sinks::util::retries: Internal log [Retrying after response.] is being suppressed to avoid flooding.
2025-08-28T11:26:01.046337Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12711002}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true
2025-08-28T11:30:44.462695Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12712800}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true
2025-08-28T11:55:26.201671Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12723853}:http: vector::internal_events::http_client: HTTP error. error=connection error: Connection reset by peer (os error 104) error_type="request_failed" stage="processing"
2025-08-28T11:55:26.201731Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12723853}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: connection error: Connection reset by peer (os error 104) internal_log_rate_limit=true
2025-08-28T12:35:15.862694Z  WARN sink{component_kind="sink" component_id=crowdstrike component_type=humio_logs}:request{request_id=12742606}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true

Example Data

<134>1 2025-08-28T14:37:15Z sjhcpmgmt1 CheckPoint 30105 - [action:"Accept"; conn_direction:"External"; flags:"2113536"; ifdir:"inbound"; ifname:"eth0"; logid:"320"; loguid:"{}"; origin:""; originsicname:""; sequencenum:"3"; time:"1756391835"; version:"5"; __policy_id_tag:"product=VPN-1 & FireWall-1[db_tag={6633D955-F90C-AC4F-887A-AE42F85F4824};mgmt=sjhcpmgmt1;date=1756331922;policy_name=AZURE_R82]"; aggregated_log_count:"22"; browse_time:"683"; bytes:"12867"; client_inbound_bytes:"4232"; client_inbound_packets:"10"; client_outbound_bytes:"8635"; client_outbound_packets:"23"; connection_count:"11"; creation_time:"1756385725"; duration:"6110"; hll_key:"13440326926202584730"; last_hit_time:"1756391335"; packets:"33"; product:"URL Filtering"; protocol:"HTTPS"; server_inbound_bytes:"8595"; server_inbound_packets:"11"; server_outbound_bytes:"3542"; server_outbound_packets:"22"; sig_id:"4"; update_count:"12"]

Additional Context

No response

References

No response

rgarcio avatar Aug 28 '25 14:08 rgarcio

Hmm I think Vector is correctly retrying, but the gateway is consistently slow or failing. Are the timeouts/502s correlated with specific log patterns or sizes?

pront avatar Nov 14 '25 21:11 pront

Not really, it randomly happened from time to time with no specific log type... after spending some time with CrowdStrike's team, they ended up concluding that this behavior was expected and that no logs were being lost during the retransmission. According to them, this is how their backend works while ingesting logs from 3rd party apps.

rgarcio avatar Dec 09 '25 15:12 rgarcio