logstash-input-mongodb
logstash-input-mongodb copied to clipboard
Inifinite loops due to "invalid byte sequence in UTF-8"
When the input MongoDB document has a non UTF-8 (or something like that), the input plugin gets into infinite loop, as execption is raised and process is restarted, just to hit the same case again.
logstash.stdout: D, [2016-09-17T21:14:03.445000 #22826] DEBUG -- : MONGODB | X.X.X.X:27017 | logs.find | STARTED | {"find"=>"logs", "filter"=>{"_id"=>{"$gt"=>BSON::ObjectId('57d948b1995d5258198b457c')}}, "limit"=>500} D, [2016-09-17T21:14:03.458000 #22826] DEBUG -- : MONGODB | X.X.X.X:27017 | logs.find | SUCCEEDED | 0.013s
logstash. {:timestamp=>"2016-09-17T21:14:03.423000+0000", :message=>"MongoDB Input threw an exception, restarting", :exception=>#<ArgumentError: invalid byte sequence in UTF-8>, :level=>:warn}
Proposed solution: Skip error record, by adding a fake record to ES, just to make sure that when Max is requestied, the failed will not be counted
Solved it the meantime by trasnforming MongoDB input to utf-8 strings. Some ideas to handle it can be found here: https://robots.thoughtbot.com/fight-back-utf-8-invalid-byte-sequences