fluency icon indicating copy to clipboard operation
fluency copied to clipboard

Fluency - emit() failed due to buffer full. Flushing buffer. Please try again.

Open fly7632785 opened this issue 2 years ago • 5 comments

Hi, I got this problem, how to deal with it, please. 企业微信截图_16549236787939 It is hitting these logs a lot all the time. image image image

Then checked source, maybe it append the data which its size is more than MaxBufferSize? How can it append the data which is more than MaxBufferSize(256M) one time ?

Looking forward to getting your reply

Thank you

Here is config. `

<appender name="FLUENCY_SYNC" class="ch.qos.logback.more.appenders.FluencyLogbackAppender">
    <!-- Tag for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file -->
    <!-- 微服务名 -->
    <tag>${applicationName}</tag>
    <!-- [Optional] Label for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file -->

    <!-- Host name/address and port number which Fluentd placed -->
    <remoteHost>${fluentdAddr}</remoteHost>
    <port>24224</port>

    <!-- [Optional] Multiple name/addresses and port numbers which Fluentd placed
   <remoteServers>
      <remoteServer>
        <host>primary</host>
        <port>24224</port>
      </remoteServer>
      <remoteServer>
        <host>secondary</host>
        <port>24224</port>
      </remoteServer>
    </remoteServers>
     -->

    <!-- [Optional] Additional fields(Pairs of key: value) -->
    <!-- 环境 -->
    <additionalField>
        <key>env</key>
        <value>${profile}</value>
    </additionalField>

    <!-- [Optional] Configurations to customize Fluency's behavior: https://github.com/komamitsu/fluency#usage  -->
    <ackResponseMode>true</ackResponseMode>
    <!-- <fileBackupDir>/tmp</fileBackupDir> -->
    <bufferChunkInitialSize>2097152</bufferChunkInitialSize>
    <bufferChunkRetentionSize>16777216</bufferChunkRetentionSize>
    <maxBufferSize>268435456</maxBufferSize>
    <bufferChunkRetentionTimeMillis>1000</bufferChunkRetentionTimeMillis>
    <connectionTimeoutMilli>5000</connectionTimeoutMilli>
    <readTimeoutMilli>5000</readTimeoutMilli>
    <waitUntilBufferFlushed>30</waitUntilBufferFlushed>
    <waitUntilFlusherTerminated>40</waitUntilFlusherTerminated>
    <flushAttemptIntervalMillis>200</flushAttemptIntervalMillis>
    <senderMaxRetryCount>12</senderMaxRetryCount>
    <!-- [Optional] Enable/Disable use of EventTime to get sub second resolution of log event date-time -->
    <useEventTime>true</useEventTime>
    <sslEnabled>false</sslEnabled>
    <!-- [Optional] Enable/Disable use the of JVM Heap for buffering -->
    <jvmHeapBufferMode>false</jvmHeapBufferMode>
    <!-- [Optional] If true, Map Marker is expanded instead of nesting in the marker name -->
    <flattenMapMarker>false</flattenMapMarker>
    <!--  [Optional] default "marker" -->
    <markerPrefix></markerPrefix>

    <!-- [Optional] Message encoder if you want to customize message -->
    <encoder>
        <pattern><![CDATA[%-5level %logger{50}#%line %message]]></pattern>
    </encoder>

    <!-- [Optional] Message field key name. Default: "message" -->
    <messageFieldKeyName>msg</messageFieldKeyName>

</appender>

 <appender name="FLUENCY" class="ch.qos.logback.classic.AsyncAppender">
    <!-- Max queue size of logs which is waiting to be sent (When it reach to the max size, the log will be disappeared). -->
    <queueSize>999</queueSize>
    <!-- Never block when the queue becomes full. -->
    <neverBlock>true</neverBlock>
    <!-- The default maximum queue flush time allowed during appender stop.
         If the worker takes longer than this time it will exit, discarding any remaining items in the queue.
         10000 millis
     -->
    <maxFlushTime>1000</maxFlushTime>
    <appender-ref ref="FLUENCY_SYNC"/>
</appender>

`

fly7632785 avatar Jun 11 '22 00:06 fly7632785

@xerial @komamitsu @jglick @mach-kernel please help

fly7632785 avatar Jun 13 '22 10:06 fly7632785

@fly7632785

Then checked source, maybe it append the data which its size is more than MaxBufferSize? How can it append the data which is more than MaxBufferSize(256M) one time ?

Do you think you appended such a huge log data at once?

If no, I guess the following things happened

  • Your Fluency instance tried to send log data to your Fluentd
  • But it kept to fail due to network issue or something
  • The unsent data were buffered
  • The internal buffer pool in Fluency got empty
  • The buffer full exception occurred

Probably some network error log messages were output before the buffer full error messages.

And it's hard to support you because you use FluencyLogbackAppender and I'm not familiar with it. It would be great if you create a minimum code w/o FluencyLogbackAppender to reproduce this issue.

komamitsu avatar Jun 13 '22 12:06 komamitsu

@komamitsu thank you for you reply. I am sure there is not a so big log data ( more than 256M),I think it maybe that the unsent data are accumulated into a big data, so it is keeping failing. Is it possible about that?

fly7632785 avatar Jun 14 '22 02:06 fly7632785

image @komamitsu the code has a comment, what's that meaning? And is it possible that there is a problem about this code ?

fly7632785 avatar Jun 14 '22 02:06 fly7632785

I think I figured it out. There is a problem about "flushInternal" method.

I had a test which occurred the problem. Here is the config:

<bufferChunkInitialSize>200470</bufferChunkInitialSize>

        <bufferChunkRetentionSize>2000470</bufferChunkRetentionSize>

        <maxBufferSize>6000470</maxBufferSize>

        <bufferChunkRetentionTimeMillis>1000</bufferChunkRetentionTimeMillis>

bufferChunkInitialSize 200K bufferChunkRetentionSize 2000K maxBufferSize 6000K

set fluentd wrong config (mock error)

       <remoteHost>${fluentdAddr}</remoteHost>

        <port>24225</port>     

loop print log

 for (int i = 0; i < 10000000L; i++) {
            log.info("logger,test:");
 }

run for a while, it occurs. image

Then I track the source to find the problem.

normalizedBuffersize is 400K allocatedSize is 5613K maxBufferSize is 6000K

normalizedBuffersize + allocatedSize > maxBufferSize
then it will releaseBuffers() and throw an error later. It stands to reason that the allocatedSize will be released next time, it will run normally.

image image

but the allocatedSize is not be released. image

then it will print the log '[AsyncAppender-Worker-FLUENCY] ERROR org.komamitsu.fluency.Fluency - emit() failed due to buffer full. Flushing buffer. Please try again...',

image

so go into a cycle and keep printing the log.

but why the allocatedSize is not be released?

image

cause ingester.ingest(tag, dataBuffer) error (fluentd error),
bufferPool.returnBuffer(flushableBuffer.getByteBuffer()) doesn't run, the bufferPool can't offer the bytebuffer, so the allocatedSize can't be released

if fluentd error can't recover, it will keep printing

And I think it is a good design, it can protect our memory, and it is also a Compensation Mechanism(when fluentd recover, the unsent log will resend)

Here is the reason I found

Thank you again.

fly7632785 avatar Jun 14 '22 09:06 fly7632785