logstash-input-mongodb icon indicating copy to clipboard operation
logstash-input-mongodb copied to clipboard

Inifinite loops due to "invalid byte sequence in UTF-8"

Open mosheka opened this issue 9 years ago • 1 comments

When the input MongoDB document has a non UTF-8 (or something like that), the input plugin gets into infinite loop, as execption is raised and process is restarted, just to hit the same case again.

logstash.stdout: D, [2016-09-17T21:14:03.445000 #22826] DEBUG -- : MONGODB | X.X.X.X:27017 | logs.find | STARTED | {"find"=>"logs", "filter"=>{"_id"=>{"$gt"=>BSON::ObjectId('57d948b1995d5258198b457c')}}, "limit"=>500} D, [2016-09-17T21:14:03.458000 #22826] DEBUG -- : MONGODB | X.X.X.X:27017 | logs.find | SUCCEEDED | 0.013s

logstash. {:timestamp=>"2016-09-17T21:14:03.423000+0000", :message=>"MongoDB Input threw an exception, restarting", :exception=>#<ArgumentError: invalid byte sequence in UTF-8>, :level=>:warn}

Proposed solution: Skip error record, by adding a fake record to ES, just to make sure that when Max is requestied, the failed will not be counted

mosheka avatar Sep 17 '16 21:09 mosheka

Solved it the meantime by trasnforming MongoDB input to utf-8 strings. Some ideas to handle it can be found here: https://robots.thoughtbot.com/fight-back-utf-8-invalid-byte-sequences

mosheka avatar Sep 18 '16 23:09 mosheka