chronix.spark Use Solr export handler

Support the Solr /export interface to transfer result sets from Solar to Spark (see https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets).
Introduce DocValues for all relevant fields to be able to export them
Align SolrTupleStreamingService with SolrStreamingService (async reads)
Benchmark Solr export vs. Solr classical access vs. SolrRDD
Evaluate Zebra export handler

Apr 23 '16 22:04 adersberger

package de.qaware.jax;

import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.CloudSolrClient; import org.apache.solr.client.solrj.io.Tuple; import org.apache.solr.client.solrj.io.stream.CloudSolrStream;

public class SolrStreaming {

/**
 * @param args the command line arguments
 * @throws java.io.IOException
 */
public static void main(String[] args) throws IOException, SolrServerException, InterruptedException {
    try (CloudSolrClient client = new CloudSolrClient("localhost:9983")) {

        client.setDefaultCollection("jax2016");
        client.connect();

        Map params = new HashMap();
        params.put("q", "*:*");
        params.put("fl", "id, Gender, Education, Income");
        params.put("sort", "id asc");
        params.put("shards", "shard1");
        params.put("qt", "/export");

        try (CloudSolrStream solrStream = new CloudSolrStream("localhost:9983", "jax2016", params)) {

            solrStream.open();

            Tuple tuple;
            int count = 0;
            while (!(tuple = solrStream.read()).getString("EOF").equals("true")) {
                if ((count % 10000) == 0) {
                    System.out.print(".");
                }
                count++;
            }
            System.out.println("\nCount: " + count);
        }
    }

}

}

Apr 27 '16 00:04 adersberger

Check to use Zebras custom export handler

Apr 28 '16 22:04 adersberger

Using DocValues feature of Solr

May 01 '16 11:05 adersberger

Manual reproducer: http://localhost:8983/solr/chronix_shard1_replica1/export?q=:&fl=metric&sort=id+asc

May 01 '16 11:05 adersberger

Align SolrTupleStreamingService with SolrStreamingService (async reads)

May 01 '16 11:05 adersberger

We can't use the Solr export handler as sketched in class ChronixSolrCloudStorage. This requires each and every exported field being indexed as DocValues. This is not the case for the data field as this contains binary data which isn't compatible with DocValues. We need to implement a custom export handler based on the Chronix format.

May 06 '16 18:05 adersberger

See Johannes Weigend's solution (will be presented at Lucene/Solr Revolution)

Oct 06 '16 18:10 adersberger

chronix.spark chronix.spark copied to clipboard

Use Solr export handler

chronix.spark
chronix.spark copied to clipboard