elasticsearch-river-mongodb
elasticsearch-river-mongodb copied to clipboard
losing documants while update more than 1k in mongodb
Hi, i indexed 10000 docs using mongodb-river.After some time count of documents in ES starts decreasing[loosing the data actually].What should i do to save my data??.
ES version : 0.90.10 river version: 1.7.4 Mongodb version: 2.4.8
River configuration:
{ "index": { "name": "database", "type": "collection" }, "mongodb": { "db": "database", "servers": [ { "port": 5000, "host": "192.168.31.50" } ], "collection": "collection", "options": { "drop_collection": "collection" } }, "type": "mongodb" }
If i delete and add river again its works fine.. Only the first import working fine[some time it strucks ]. After there is always some documents missing.During load data change in mongodb collection is not reflecting in ES.
Please help me..!
Can you please provide ES log file? The river has been tested with over 5,000,000 documents with no issue.
Mapping of Documents
"ro": {
"properties": {
"_keywords": {
"type": "string"
},
"appId": {
"type": "string"
},
"children": {
"properties": {
"_id": {
"type": "string"
}
}
},
"code": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"color": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"configData": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"connections": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"destinationIp": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"deviceId": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"deviceName": {
"type": "string",
"analyzer": "custom_analyzer"
},
"deviceType": {
"type": "string",
"analyzer": "custom_analyzer"
},
"displayName": {
"type": "string",
"analyzer": "custom_analyzer"
},
"dns": {
"type": "string",
"analyzer": "custom_analyzer"
},
"ip": {
"type": "string",
"analyzer": "custom_analyzer"
},
"isOrphan": {
"type": "boolean"
},
"name": {
"type": "string",
"analyzer": "custom_analyzer"
},
"orphanSortLevel": {
"type": "long"
},
"parent": {
"properties": {
"_id": {
"type": "string"
},
"name": {
"type": "string"
},
"rootIds": {
"type": "string"
}
}
},
"parentStatus": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"permission": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"roles": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"rootIds": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"sortLevel": {
"type": "long"
},
"status": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"statusCode": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
},
"tag": {
"type": "string",
"analyzer": "custom_analyzer"
},
"vendor": {
"type": "string",
"index": "not_analyzed",
"norms": {
"enabled": false
},
"index_options": "docs"
}
}
}
}
River plugin mapping http://hostname:9200/_river/_search { "took": 5, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 4, "max_score": 1, "hits": [ { "_index": "_river", "_type": "mongodb_ro", "_id": "_meta", "_score": 1, "_source": { "type": "mongodb", "mongodb": { "servers": [ { "host": "192.168.31.100", "port": 5000 } ], "options": { "drop_collection": "ro" }, "db": "test", "collection": "ro" }, "index": { "name": "test", "type": "ro" } } }, { "_index": "_river", "_type": "mongodb_ro", "_id": "test.ro", "_score": 1, "_source": { "mongodb": { "_last_ts": "{ "$ts" : 1398146397 , "$inc" : 22}" } } }, { "_index": "_river", "_type": "mongodb_ro", "_id": "_riverstatus", "_score": 1, "_source": { "mongodb": { "status": "RUNNING" } } }, { "_index": "_river", "_type": "mongodb_ro", "_id": "_status", "_score": 1, "_source": { "ok": true, "node": { "id": "yEiQOhYmTRO7OeJtvFGjtQ", "name": "Thor Girl", "transport_address": "inet[/192.168.31.100:9300]" } } } ] } }
I changed ES log to error mode so, i don't find any logs in Elasticsearch.log
HI.. i found the scenario .If i do update documents update in 100's there is no problem.But if i update more than 1000 objects on some times.I start losing documents.Its in small numbers,but in cumulatively i lose large number.I have attached my log file..
Please help me..!
I have a similar issue referenced here: #282
I think we need to have tests created that try updating more than 1000 documents at once.
Hi, I have the same problem updating data through mongodb-river.
I'm using the following versions:
mongodb 2.6.3 elasticsearch 1.2.2 mongodb.river-2.0.1
It runs OK with bulk inserts or single updates ,the problem only appears when updating data with bulk operations. Even updating less than 1000 documents at once, I have tried bulks with 100 size (as @sidharthancr said) and it also happens.
i.e.: I make a bulk upsert with 200K documents in mongodb over an empty collection and every doc is synchronized in elasticsearch. Then I repeat that bulk upsert with same document ids but some different data and a couple of documents are lost at the end of the load. Instead having 200K documents, it usually ends with 199.997, 199.998, 199.999....
Is your MongoDB sharded?
No, it isn't. My MongoDB configuration is only one mongod instance with the required replicaset
Hi @sidharthancr @danivzq .. Did you find any solution for this ? I am seeing the similar scenario while performing bulk upserts. the difference is very minimal.. Please suggest if there is any fix for this.
Thanks in advance
I'm also experiencing some issues with data lose. I have millions of documents in collection and sometimes random documents not updated/deleted. And I have no understanding how to fix/avoid this problem and what is the reason for that?
There are a lot of issues about similar behaviour. Is there understanding of why it can happen? I know that rivers were disabled in newest ES versions, but what is an alternative solution for replicating data from Mongo to ES? Any thoughts?