fluent-plugin-scribe Severe performance issues

We deployed fluentd to production using this plugin along with the out_redshift plugin.

Even during our initial benchmarks we saw that working with in_scribe gives far worse results than working with other input methods (like in_forward, which was giving 18kmsg/sec vs. 1kmsg/sec with in_scribe). But when we pushed real production traffic with all the plugins setup (during benchmark we used only in_scribe and out_file) it just couldn't handle the load (we're talking about ~300msg/sec).

It looks like the culprit is that all the message handling is happening on the same thread as the one that receives the Scribe messages and there is no actual use of Cool.io. So very often the processing gets delayed for some reason, the Scribe server will get a timeout and will stop sending data in until the retry period ends. But even then after a minute or so it dies again.

We worked around this issue by having in_scribe enqueue all the messages into a Queue and have another thread that will call Engine.emit on the messages in the queue. But this is sub optimal and far from being "production ready".

Jul 17 '13 19:07 arikfr

Object queuing in input plugins not based on Fluentd buffering is weak for crashes, so fixes you mentioned are hard to merge. We may be able to fix like this to reduce times to call Engine.emit(), and also to reduce processing time in thrift event handler:

# FluentScribeHandler
def Log(msgs)
  bucket = {} # tag -> events(array of [time,record])
  time_now = Engine.now
  begin
    msgs.each { |msg|
      record = create_record(msg)
      tag = @add_prefix ? @add_prefix + '.' + msg.category : msg.category
      bucket[tag] ||= []
      bucket[tag].push([time_now,record])
    }
    bucket.each { |tag,events|
      Engine.emit_array(tag, events)
    }
    return ResultCode::OK
  rescue => e
    $log.error "unexpected error", :error=>$!.to_s
    $log.error_backtrace
    return ResultCode::TRY_LATER
  end
end

Thoughts?

Sep 02 '13 04:09 tagomoris

As mentioned what we did was only a work around and not something that should be the solution.

From what I've seen is that unless you make the Scribe/Thrift server work with Cool.io any solution will be non optimal.

Oct 04 '13 16:10 arikfr

@arikfr would you mind open sourcing your non-production-ready code? We've been running into similar issues.

We switched back to running scribe for input and are using fluentd tail to then move stuff across until we are done transitioning off scribe.

Nov 12 '14 00:11 hfwang

@hfwang we are no longer using Fluent and unfortunately I didn't keep that code.

Nov 12 '14 06:11 arikfr

@hfwang @arikfr so both of you continue to use Scribe? Any reason for not totally switching from Scribe to Fluentd? That would obviate the need for in_scribe altogether.

Nov 13 '14 03:11 kiyoto

Sounds like arikfr is no longer using fluentd.

We have numerous legacy systems that continue to emit scribe logs. We don't have the engineering capacity to update everything at once, and as long as our servers don't fall over, it isn't a priority. New development uses fluentd though.

Nov 13 '14 03:11 hfwang

Our situation is pretty much the same as @hfwang described.

Nov 13 '14 10:11 arikfr

I can fix in_scribe w/ code as I mentioned on https://github.com/fluent/fluent-plugin-scribe/issues/6#issuecomment-23640557. But I'm not using in_scribe now, so I cannot test its effects.

@hfwang Can you test fixed code if I push a branch?

Nov 13 '14 11:11 tagomoris

Pushed https://github.com/fluent/fluent-plugin-scribe/tree/reduce_emit_times @hfwang Coud you build, install and test this code?

git clone https://github.com/fluent/fluent-plugin-scribe.git
cd fluent-plugin-scribe
git checkout reduce_emit_times
bundle install
bundle rake build
gem install pkg/fluent-plugin-scribe-0.10.13.gem
# or fluent-gem install ... 
# or td-agent-gem install ...

Nov 13 '14 12:11 tagomoris

I'll take a look at this probably next week... but will do and thanks!

Nov 13 '14 20:11 hfwang

fluent-plugin-scribe fluent-plugin-scribe copied to clipboard

Severe performance issues

fluent-plugin-scribe
fluent-plugin-scribe copied to clipboard