fluent-plugin-aws-elasticsearch-service
fluent-plugin-aws-elasticsearch-service copied to clipboard
Unable to handle ElasticSearch "RequestEntityTooLarge" error correctly
Hi,
We receive the following error when pushing a "chunk" to ElasticSearch:
2017-07-18 10:11:32 +0000 [warn]: failed to flush the buffer. plugin_id="object:15aa644" retry_time=10 next_retry=2017-07-18 10:20:36 +0000 chunk="554949b8663b0bd8416988071dcd1bf3" error_class=Elasticsearch::Transport::Transport::Errors::RequestEntityTooLarge error="[413] {\"Message\":\"Request size exceeded 10485760 bytes\"}"
017-07-18 10:11:32 +0000 [debug]: chunk taken back instance=22660880 chunk_id="554949b8663b0bd8416988071dcd1bf3" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.lib.rkt.pods.run.410d6112-b970-40d7-8b71-c5ee25452c17.stage1.rootfs.opt.stage2.hyperkube.rootfs.var.log.containers.service-2066068252-ns9xw_int_service-9058641f27539563ca400a4c1507ef500c48a6a2daa60d0b9330f0fd9c91b63e.log", variables=nil>
What happens next is that the plugin retries to deliver this chunk indefinitely and doesn't ever "move past it" and continue processing "chunks" which are not "too large".
Therefore, this effectively blocks any progress and logs are not ending up in our ElasticSearch cluster with no functional difference to FluentD being "down".
The plugin should not "retry" the "chunk" in this scenario as there is almost no chance that the ElasticSearch cluster will lift it's "max payload" limits.
I would expect the plugin to:
- Log this as an "error" not a "warning"?
- If the plugin receives this error back from ES, write the "chunk" to disk somewhere (dead letter queue?) and "move on"
We are also looking at filtering out very large log entries before they hit the plugin.
Supported splitting huge chunk requests (> 20MB) feature on ES plugin side: https://github.com/uken/fluent-plugin-elasticsearch/issues/535
Should we this feature configurable?
* Log this as an "error" not a "warning"?
We can't do it. This warning is generated in Fluentd core.
If the plugin receives this error back from ES, write the "chunk" to disk somewhere (dead letter queue?) and "move on"
Adding 413 HTTP status code handling in ES plugin can achieve your feature request. WDYT?
hi, any updates ? @cosmo0920 did you have a chance to fix the issue ?
Adding 413 HTTP status code handling in ES plugin can achieve your feature request. WDYT?
Yep, sounds good to me! What are your thoughts as to how it should behave if it receives this message back? Should it just give up/delete that chunk? Is there any way to split it and resend or put it somewhere and move to the next chunk?
can we make this as configurable? since I know the maximum size of HTTP Request Payloads varies with different instance type.