fluent-plugin-aws-elasticsearch-service Unable to handle ElasticSearch "RequestEntityTooLarge" error correctly

Hi,

We receive the following error when pushing a "chunk" to ElasticSearch:

2017-07-18 10:11:32 +0000 [warn]: failed to flush the buffer. plugin_id="object:15aa644" retry_time=10 next_retry=2017-07-18 10:20:36 +0000 chunk="554949b8663b0bd8416988071dcd1bf3" error_class=Elasticsearch::Transport::Transport::Errors::RequestEntityTooLarge error="[413] {\"Message\":\"Request size exceeded 10485760 bytes\"}"

017-07-18 10:11:32 +0000 [debug]: chunk taken back instance=22660880 chunk_id="554949b8663b0bd8416988071dcd1bf3" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.lib.rkt.pods.run.410d6112-b970-40d7-8b71-c5ee25452c17.stage1.rootfs.opt.stage2.hyperkube.rootfs.var.log.containers.service-2066068252-ns9xw_int_service-9058641f27539563ca400a4c1507ef500c48a6a2daa60d0b9330f0fd9c91b63e.log", variables=nil>

What happens next is that the plugin retries to deliver this chunk indefinitely and doesn't ever "move past it" and continue processing "chunks" which are not "too large".

Therefore, this effectively blocks any progress and logs are not ending up in our ElasticSearch cluster with no functional difference to FluentD being "down".

The plugin should not "retry" the "chunk" in this scenario as there is almost no chance that the ElasticSearch cluster will lift it's "max payload" limits.

I would expect the plugin to:

Log this as an "error" not a "warning"?
If the plugin receives this error back from ES, write the "chunk" to disk somewhere (dead letter queue?) and "move on"

We are also looking at filtering out very large log entries before they hit the plugin.

Jul 19 '17 05:07 srkiNZ84

Supported splitting huge chunk requests (> 20MB) feature on ES plugin side: https://github.com/uken/fluent-plugin-elasticsearch/issues/535

Should we this feature configurable?

Apr 15 '19 02:04 cosmo0920

* Log this as an "error" not a "warning"?

We can't do it. This warning is generated in Fluentd core.

If the plugin receives this error back from ES, write the "chunk" to disk somewhere (dead letter queue?) and "move on"

Adding 413 HTTP status code handling in ES plugin can achieve your feature request. WDYT?

Apr 15 '19 02:04 cosmo0920

hi, any updates ? @cosmo0920 did you have a chance to fix the issue ?

Apr 17 '19 17:04 dmgcodevil

Adding 413 HTTP status code handling in ES plugin can achieve your feature request. WDYT?

Yep, sounds good to me! What are your thoughts as to how it should behave if it receives this message back? Should it just give up/delete that chunk? Is there any way to split it and resend or put it somewhere and move to the next chunk?

Apr 19 '19 10:04 srkiNZ84

can we make this as configurable? since I know the maximum size of HTTP Request Payloads varies with different instance type.

May 02 '19 16:05 kkc

fluent-plugin-aws-elasticsearch-service fluent-plugin-aws-elasticsearch-service copied to clipboard

Unable to handle ElasticSearch "RequestEntityTooLarge" error correctly

fluent-plugin-aws-elasticsearch-service
fluent-plugin-aws-elasticsearch-service copied to clipboard