spark-solr
spark-solr copied to clipboard
Some documents getting skipped
I am using Spark-Solr v3.4.5 with Spark 2.3.1. When I run my Solr query in Solr dashboard, I can see 3128 documents, but in my Spark code, only 3108 documents are returned.
I am reading a field value and printing it into a text file and then counting the number of rows in the text file. Below is the sample code:
val solrQuery=new SolrQuery("id:5 AND (type:R OR type:V) AND p_type:O")
solrQuery.setFields("id", "o_id", "p_id", "t_id", "p_type")
solrQuery.setTimeAllowed(0)
val solrRDD = new SelectSolrRDD(solrURL, collectionName, sc)
val solrDataRDD=solrRDD.query(solrQuery)
solrDataRDD.map(row=> {
val pId = row.getFieldValue("p_id").toString
pId
}).repartition(1).saveAsTextFile(path)
There are 20 rows less in the saved text file. One of the p_ids that are missing is 129876.
Interestingly, if I explicitly pass the p_id in the query like this:
id:5 AND (type:R OR type:V) AND p_type:O AND p_id:129876
then it gets printed in the text file.
Can you upload a sample dataset for us to reproduce this scenario?