spark-solr icon indicating copy to clipboard operation
spark-solr copied to clipboard

Some documents getting skipped

Open piyushshri opened this issue 5 years ago • 1 comments

I am using Spark-Solr v3.4.5 with Spark 2.3.1. When I run my Solr query in Solr dashboard, I can see 3128 documents, but in my Spark code, only 3108 documents are returned.

I am reading a field value and printing it into a text file and then counting the number of rows in the text file. Below is the sample code:

val solrQuery=new SolrQuery("id:5 AND (type:R OR type:V) AND p_type:O")
solrQuery.setFields("id", "o_id", "p_id", "t_id", "p_type")
solrQuery.setTimeAllowed(0)
val solrRDD = new SelectSolrRDD(solrURL, collectionName, sc)
val solrDataRDD=solrRDD.query(solrQuery)
solrDataRDD.map(row=> {
  val pId = row.getFieldValue("p_id").toString
  pId
}).repartition(1).saveAsTextFile(path)

There are 20 rows less in the saved text file. One of the p_ids that are missing is 129876.

Interestingly, if I explicitly pass the p_id in the query like this:

id:5 AND (type:R OR type:V) AND p_type:O AND p_id:129876

then it gets printed in the text file.

piyushshri avatar Apr 05 '19 05:04 piyushshri

Can you upload a sample dataset for us to reproduce this scenario?

kiranchitturi avatar Mar 15 '20 05:03 kiranchitturi