vector
vector copied to clipboard
Compression between vector and ElasticSearch is not working for egress traffic
Vector Version
0.15.0
Vector Configuration File
[sinks.es]
type = "elasticsearch"
inputs = ["es_timestamp"]
healthcheck.enabled = true
auth.strategy = "basic"
auth.user = "---"
auth.password = "---"
endpoint = "---:9200"
mode = "data_stream"
data_stream.auto_routing = false
data_stream.sync_fields = false
data_stream.type = "---"
data_stream.dataset = "logs"
data_stream.namespace = "ds"
compression = "gzip"
batch.max_bytes = 50000000
batch.timeout_secs = 1
buffer.max_events = 10000000
buffer.type = "memory"
buffer.when_full = "block"
Expected Behavior
When we configure compression = "gzip" for ElasticSearch sink it is expected that compression is applied in both directions - ingress and egress.
Actual Behavior
Compression is applied only to ingress traffic.
Workaround
Custom headers applied on the vector side fix the situation and the return traffic is also compressed.
[sinks.es]
compression = "gzip"
# response compression
request.headers.Accept-Encoding = "gzip"
Additional Context
ES version - 7.13.1 Also tested with vector - 0.12.2
Compression is clearly enabled on all ES nodes.
It might be that vector is not configuring its headers correctly when compression is enabled. It seems that ES is expecting to receive specific headers that will instruct it for using compression on the reply traffic (egress) https://discuss.elastic.co/t/http-compression-enabled-but-response-not-compressed/103513/7
Ingress Traffic Rates when we apply Compression
Egress Traffic Rates when we apply Compression
Egress Traffic Rates when we apply custom Headers and Compression
Thanks @okorolov . It does appear that we should be including that header when compression = "gzip"
@jszwedko Sorry to be a nuisance, but when you remove an issue from a milestone, does it mean the fix was included? Or does it mean it gets postponed until further notice? Thank you very much.
Hey @smlgbl ! No worries. No, unfortunately in this case, it meant that it was deprioritized. I could see us trying to get it into 0.19.0 though. I'll add to that milestone.
That would be great, even though the workaround mentioned above is rather trivial. Thanks for clarifying though.
@jszwedko Just as a side-note: even if you don't get around to fixing the bug, at least put into to docs. We have our on-premise apps log to an OpenSearch cluster at AWS and our Outbound-Traffic costs were almost $200 per day. After setting the response header field, it's down to $6 per day!
Oh wow, that is substantial. Thanks for the note @smlgbl . We'll prioritize fixing this for the next release.
Hi @jszwedko
The workaround with a custom header has a significant downside.
Since Vector doesn't expect responses to be gzipped it doesn't try to read the response from Elasticsearch, hence it's silent even if there's an error in the response (ES "bulk" API can respond with HTTP 200 but with errors in the body):
2022-04-16T19:18:09.949786Z DEBUG sink{component_kind="sink" component_id=es component_type=elasticsearch component_name=es}:request{request_id=0}:http: vector::internal_events::http_client: HTTP response. status=200 OK version=HTTP/1.1 headers={"content-type": "application/json; charset=UTF-8", "content-encoding": "gzip", "content-length": "286"} body=[286 bytes]
I spent hours trying to figure out why I have only a half of the logs. Once I removed the custom headers I saw that Elasticsearch had some problems with dynamic mapping:
2022-04-16T19:20:52.998750Z ERROR sink{component_kind="sink" component_id=es component_type=elasticsearch component_name=es}:request{request_id=1}: vector::internal_events::elasticsearch: Response containerd errors. error_code=http_response_200 error_type="request_failed" stage="sending" response=Response { status: 200, version: HTTP/1.1, headers: {"content-type": "application/json; charset=UTF-8", "content-length": "375612"}, body: b"{\"took\":54,\"ingest_took\":50,\"errors\":true,\"items\":[{\"index\":{\"_index\":\"<index>\",\"_type\":\"_doc\",\"_id\":\"bAzSM4ABck9U0YcrKYIr\",\"status\":400,\"error\":{\"type\":\"mapper_parsing_exception\",\"reason\":\"Could not dynamically add mapping for field [app.kubernetes.io/managed-by]. Existing mapping for [kubernetes.pod_labels.app] must be of type object but found [text].\"}}}, <many similar errors>]}" }
Thanks for bumping this @vpedosyuk . It fell off our radar. I've added this into our backlog.
PR #13571 was incomplete- reverted it until we have support for reading compressed responses.
@neuronull By the way, are there plans to fix it anywhere in the near future?
@neuronull By the way, are there plans to fix it anywhere in the near future?
Q4 priorities are still being finalized, but decently high on the list is dedicating a number of weeks to addressing technical debt that we've been unable to get to, such as this issue. :pray: