elasticsearch-river-mongodb icon indicating copy to clipboard operation
elasticsearch-river-mongodb copied to clipboard

mongo river not indexing data

Open Jay13 opened this issue 10 years ago • 7 comments

The current river version is failing to do initial import.We traced the issue to sort option which has been added to the initial import query in CollectionSlurper. The sort option is triggering collection scan which is failing in our case as the collection is having roughly 12 million docs (despite having proper index on filter criteria).We have fixed the same by removing the sort option from initial import query(Optimal solution would have been to add a limit option to the query).

Refer https://jira.mongodb.org/browse/SERVER-12923

es-river-mongodb : 2.0.4 es : 1.4 mongo : 2.6

Jay13 avatar Dec 29 '14 07:12 Jay13

It's well known that 2.0.4 doesn't work properly. Have you tried 2.0.5?

tmatei avatar Dec 30 '14 18:12 tmatei

@Jay13 thanks for looking into the problem

i'm curious in what was is it failing? is it taking longer than you'd expect because the query is still running? is the query timing out? is the query crashing your mongod instance?

removing the sort option is unsafe because getFilterForInitialImport expects you to be iterating over the ids in order. if you remove sorting, then you should also remove getFilterForInitialImport and the retry logic

benmccann avatar Dec 30 '14 19:12 benmccann

Hello,

Wondering why mongodb river plugin installation is failing ? I'm able to get mapper attachment plugin from elasticsearch.org. Appreciate your help.

Cheers.

root@server:/usr/share/elasticsearch# cd /usr/share/elasticsearch && bin/plugin --verbose --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5 -> Installing com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5... Trying http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip... Failed: IOException[Can't get http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip to /usr/share/elasticsearch/plugins/river-mongodb.zip]; nested: FileNotFoundException[http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip]; nested: FileNotFoundException[http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip]; Trying http://search.maven.org/remotecontent?filepath=com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zip... Failed: SocketTimeoutException[connect timed out] Trying https://oss.sonatype.org/service/local/repositories/releases/content/com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zip... Failed: SocketTimeoutException[connect timed out]

mygitrepo avatar Dec 31 '14 00:12 mygitrepo

@tmatei Have not yet tried 2.05.Will look into same.Thank you

Jay13 avatar Dec 31 '14 10:12 Jay13

@benmccann The problem is, the mongo query planner is selecting the "_id" index -- which results into a full collection scan rather than using the indexed field, the problem query does not seem to return as you can see below from the mongo log excerpt , $orderby: { _id: 1 } } planSummary: IXSCAN { _id: 1 } cursorid:1191792168547 ntoreturn:0 ntoskip:0 nscanned:8845011 nscannedObjects:8845010 keyUpdates:0 numYields:1370464 locks(micros) r:8493905138 nreturned:15 reslen:1060755 45350427ms

Jay13 avatar Dec 31 '14 11:12 Jay13

@mygitrepo it looks like you cannot access this link: https://oss.sonatype.org/service/local/repositories/releases/content/com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zip

This link is available - please check your environment.

richardwilly98 avatar Jan 02 '15 12:01 richardwilly98

Thanks ! Using proxyHost and proxyPort did the trick. cd /usr/share/elasticsearch && bin/plugin -DproxyHost=64.102.255.40 -DproxyPort=80 --verbose --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5-> Installing com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5 From: Richard [email protected] To: richardwilly98/elasticsearch-river-mongodb [email protected] Cc: mygitrepo [email protected] Sent: Friday, January 2, 2015 4:02 AM Subject: Re: [elasticsearch-river-mongodb] mongo river not indexing data (#445)

@mygitrepo it looks like you cannot access this link: https://oss.sonatype.org/service/local/repositories/releases/content/com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zipThis link is available - please check your environment.— Reply to this email directly or view it on GitHub.

mygitrepo avatar Jan 08 '15 18:01 mygitrepo