chronix.spark
chronix.spark copied to clipboard
Use Solr export handler
- Support the Solr /export interface to transfer result sets from Solar to Spark (see https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets).
- Introduce DocValues for all relevant fields to be able to export them
- Align SolrTupleStreamingService with SolrStreamingService (async reads)
- Benchmark Solr export vs. Solr classical access vs. SolrRDD
- Evaluate Zebra export handler
package de.qaware.jax;
import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.CloudSolrClient; import org.apache.solr.client.solrj.io.Tuple; import org.apache.solr.client.solrj.io.stream.CloudSolrStream;
public class SolrStreaming {
/**
* @param args the command line arguments
* @throws java.io.IOException
*/
public static void main(String[] args) throws IOException, SolrServerException, InterruptedException {
try (CloudSolrClient client = new CloudSolrClient("localhost:9983")) {
client.setDefaultCollection("jax2016");
client.connect();
Map params = new HashMap();
params.put("q", "*:*");
params.put("fl", "id, Gender, Education, Income");
params.put("sort", "id asc");
params.put("shards", "shard1");
params.put("qt", "/export");
try (CloudSolrStream solrStream = new CloudSolrStream("localhost:9983", "jax2016", params)) {
solrStream.open();
Tuple tuple;
int count = 0;
while (!(tuple = solrStream.read()).getString("EOF").equals("true")) {
if ((count % 10000) == 0) {
System.out.print(".");
}
count++;
}
System.out.println("\nCount: " + count);
}
}
}
}
Check to use Zebras custom export handler
Using DocValues feature of Solr
Manual reproducer: http://localhost:8983/solr/chronix_shard1_replica1/export?q=:&fl=metric&sort=id+asc
Align SolrTupleStreamingService with SolrStreamingService (async reads)
We can't use the Solr export handler as sketched in class ChronixSolrCloudStorage. This requires each and every exported field being indexed as DocValues. This is not the case for the data field as this contains binary data which isn't compatible with DocValues. We need to implement a custom export handler based on the Chronix format.
See Johannes Weigend's solution (will be presented at Lucene/Solr Revolution)