elasticsearch-river-mongodb
elasticsearch-river-mongodb copied to clipboard
Impossible to import collection with binary _id
Hi
I have installed this river following the wiki, here is my config:
{
"index": {
"name": "testdb",
"type": "torrents"
},
"mongodb": {
"db": "testdb",
"servers": [
{
"port": 27017,
"host": "127.0.0.1"
}
],
"credentials": [
{
"db": "admin",
"password": "password",
"user": "username"
}
],
"collection": "torrents_data",
"options": {
"exclude_fields": [
"files"
],
"secondary_read_preference": true
}
},
"type": "mongodb"
}
Here some logs:
[2017-02-13 12:24:40,272][INFO ][river.mongodb ] [Nomad] Creating MongoClient for [[127.0.0.1:27017]]
[2017-02-13 12:24:41,793][INFO ][river.mongodb ] [Nomad] [mongodb][testdb] MongoDB version - 3.2.11
[2017-02-13 12:24:41,923][INFO ][river.mongodb ] [Nomad] [mongodb][testdb] MongoDBRiver is beginning initial import of btdht-crawler.torrents_data
[2017-02-13 12:24:42,649][DEBUG][action.bulk ] [Nomad] [testdb][2] failed to execute bulk item (index) index {[testdb][torrents][[B@4c438a69], source[{"seeds_peers":0,"file_nb":1,"added":1.486630897982914E9,"_id":"AMAiYk0SsXkBnCD9lxr55m6m/F0=","complete":0,"created":1486630897,"name":"Setup Terraria 1.3.0.3 GOG Version.exe","peers":0,"categories":["software"],"seeds":0,"last_scrape":1486630899,"size":137288792}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [_id]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.internal.IdFieldMapper.parse(IdFieldMapper.java:295)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:493)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:409)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Provided id [[B@4c438a69] does not match the content one [AMAiYk0SsXkBnCD9lxr55m6m/F0=]
at org.elasticsearch.index.mapper.internal.IdFieldMapper.parseCreateField(IdFieldMapper.java:310)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 14 more
ending up with an IMPORT_FAILED status.
Here the mongodb document:
rs1:PRIMARY> db.torrents_data.find({_id: BinData(0,"AMAiYk0SsXkBnCD9lxr55m6m/F0=")})
{ "_id" : BinData(0,"AMAiYk0SsXkBnCD9lxr55m6m/F0="), "files" : null, "added" : 1486630897.982914, "name" : "Setup Terraria 1.3.0.3 GOG Version.exe", "created" : 1486630897, "file_nb" : 1, "size" : 137288792, "peers" : 0, "seeds" : 0, "last_scrape" : 1486630899, "complete" : 0, "seeds_peers" : 0, "categories" : [ "software" ] }
So I am unable to index my mongodb collection: for every document, I get the error in the logs above.
I am guessing that this may be due to the fact that my _id are binary data (non ascii, 20 bytes binary data), but I am no sure.
Does anyone known how to solve this ?
I have tested with a cloned collection where _id are hexadecimally encoded and all the documents are successfully indexed, so I think this confirm that there is an issue with binary _id.