vector
vector copied to clipboard
High CPU and memory consumption with buffer enabled - part 2
Vector Version
vector 0.15.0 (x86_64-unknown-linux-gnu 994d812 2021-07-16)
Vector Configuration File
Using disk buffer keeps causing CPU usage to grow over time. Removing the disk buffer makes this behavior go away
sinks:
http_output:
type: http
inputs:
- some_transform
compression: gzip
uri: https://DESTINATION:PORT/URI
tls:
ca_file: /etc/vector/ssl_auth/ca-certificates.crt
crt_file: /etc/vector/ssl_auth/client.crt
key_file: /etc/vector/ssl_auth/client.key
verify_hostname: true
encoding:
codec: ndjson
timestamp_format: rfc3339
buffer:
max_size: 10737418240
type: disk
when_full: block
request:
concurrency: 3
Debug Output
N/A
Expected Behavior
CPU and memory consumption should look fairly consistent
Actual Behavior
Like discussed in a previous issue - resource usage goes up over time.
Example Data
Please provide any example data that will help debug the issue, for example:
N/A
Additional Context
References
Previous github issues: https://github.com/timberio/vector/issues/7246
Other
As a work around - I have installed a local instance of kafka to buffer output.
Configuring kafka is beyond the scope of the this post, but if someone else is running into these issues - they may find it useful to try a configuration similar to what I have below.
api:
enabled: true
log_schema:
timestamp_key: timestamp
sources:
http_in:
type: http
address: 0.0.0.0:80
encoding: ndjson
acknowledgements: true
##used later to buffer output
buffer_from_localhost:
type: kafka
acknowledgements: true
auto_offset_reset: beginning
bootstrap_servers: localhost:9092
group_id: buffer_from_kafka
topics:
- messages
transforms:
transform:
inputs:
- http_in
type: remap
source: |2-
#USE YOUR IMAGINATION
sinks:
buffer_to_localhost:
type: kafka
inputs:
- transform
bootstrap_servers: localhost:9092
compression: snappy
encoding:
codec: json
timestamp_format: rfc3339
healthcheck: false
topic: messages
output:
type: http
inputs:
- buffer_from_localhost
compression: gzip
uri: https://DESTINATION:PORT/URI
tls:
ca_file: /etc/vector/ssl_auth/ca-certificates.crt
crt_file: /etc/vector/ssl_auth/client.crt
key_file: /etc/vector/ssl_auth/client.key
verify_hostname: true
encoding:
codec: text
only_fields:
- message
request:
concurrency: 3
Do note - that this example pushes to kafka using json, picks it back up, where the json is now text in field 'message', but we leave it as is and just write it out with codec text. In other situations - you will likely want to parse_json in a remap transform.
Example:
from_kafka_decode:
inputs:
- buffer_from_localhost
type: remap
source: |2-
. = parse_json!(del(.message))
@ryn9 just curious if you see this same behavior with the new disk buffer implementation (should be the default in 0.26.0).
@jszwedko I have added disk buffering back into the config that was causing the most issues. Circle back with me in a few days or weeks to see results ...
@jszwedko including the disk buffering in itself did not seem to cause any issues with memory and/or cpu over the past few days. As such - I have removed the kafka component and using only the disk buffering mechanism now. I will report back a week from now if there was any cpu and/or memory growth.
@jszwedko this seems to continue holding steady - I will circle back again in 2 weeks
@jszwedko cpu and memory have been holding steady. for my use case at least - I would would say the issue is resolved. please feel free to close this case. thank you!
Awesome, thanks for confirming @ryn9 ! I'll close this out.