elasticsearch-river-mongodb
elasticsearch-river-mongodb copied to clipboard
Failover recovery
We have river, and lets say we insert record that mark river as IMPORT_FAILED, in logs we have:
{{{ [2015-02-10 17:10:01,315][ERROR][org.elasticsearch.river.mongodb.MongoDBRiverBulkProcessor] Bulk processor failed. failure in bulk execution: [310]: index [products], type [products], id [54cf5b428bcaf055ba799a75], message [MapperParsingException[failed to parse [created_at]]; nested: MapperParsingException[failed to parse date field [Mon Feb 02 2015 12:17:07 GMT+0100 (CET)], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "Mon Feb 02 2015 12:17:07 GMT+010..."]; ] [2015-02-10 17:10:01,315][INFO ][river.mongodb.util ] setRiverStatus called with products - IMPORT_FAILED }}}
When we found this record, delete it and start river it have "RUNNING" status, but river "freezed" and not working at all. In logs we have: {{{ [2015-02-10 17:11:01,096][DEBUG][action.index ] [Worm] [][0], node[gj_Z6EoMSNG6jAV_aPL0-A], [P], s[STARTED]: Failed to execute [index {[][products][_refresh], source[na]}] org.elasticsearch.index.mapper.MapperParsingException: failed to parse, document is empty at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:559) }}}
Some questions:
- "failed to parse, document is empty" - can it be more verbose? With id of record? I check source of elasticsearch code, SourceToParse class have id, is it mongodb id?
- How recover in this case? On one record fail recreate river is not a good solution. It is not a good recovery.
- Can river just ignore such records by option? For us much more important that next many "right" updates was delivered to elasticsearch, than river stopped at one "wrong" record and wait when we recover it. We will find later wrong records, fix it, save and it will be delivered to ES. For now there is a risk on one minor bug in software that save something in mongo have fail on whole ES river.
Thanks!