chronix.spark icon indicating copy to clipboard operation
chronix.spark copied to clipboard

Use Solr export handler

Open adersberger opened this issue 8 years ago • 7 comments

  • Support the Solr /export interface to transfer result sets from Solar to Spark (see https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets).
  • Introduce DocValues for all relevant fields to be able to export them
  • Align SolrTupleStreamingService with SolrStreamingService (async reads)
  • Benchmark Solr export vs. Solr classical access vs. SolrRDD
  • Evaluate Zebra export handler

adersberger avatar Apr 23 '16 22:04 adersberger

package de.qaware.jax;

import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.CloudSolrClient; import org.apache.solr.client.solrj.io.Tuple; import org.apache.solr.client.solrj.io.stream.CloudSolrStream;

public class SolrStreaming {

/**
 * @param args the command line arguments
 * @throws java.io.IOException
 */
public static void main(String[] args) throws IOException, SolrServerException, InterruptedException {
    try (CloudSolrClient client = new CloudSolrClient("localhost:9983")) {

        client.setDefaultCollection("jax2016");
        client.connect();

        Map params = new HashMap();
        params.put("q", "*:*");
        params.put("fl", "id, Gender, Education, Income");
        params.put("sort", "id asc");
        params.put("shards", "shard1");
        params.put("qt", "/export");

        try (CloudSolrStream solrStream = new CloudSolrStream("localhost:9983", "jax2016", params)) {

            solrStream.open();

            Tuple tuple;
            int count = 0;
            while (!(tuple = solrStream.read()).getString("EOF").equals("true")) {
                if ((count % 10000) == 0) {
                    System.out.print(".");
                }
                count++;
            }
            System.out.println("\nCount: " + count);
        }
    }

}

}

adersberger avatar Apr 27 '16 00:04 adersberger

Check to use Zebras custom export handler

adersberger avatar Apr 28 '16 22:04 adersberger

Using DocValues feature of Solr

adersberger avatar May 01 '16 11:05 adersberger

Manual reproducer: http://localhost:8983/solr/chronix_shard1_replica1/export?q=:&fl=metric&sort=id+asc

adersberger avatar May 01 '16 11:05 adersberger

Align SolrTupleStreamingService with SolrStreamingService (async reads)

adersberger avatar May 01 '16 11:05 adersberger

We can't use the Solr export handler as sketched in class ChronixSolrCloudStorage. This requires each and every exported field being indexed as DocValues. This is not the case for the data field as this contains binary data which isn't compatible with DocValues. We need to implement a custom export handler based on the Chronix format.

adersberger avatar May 06 '16 18:05 adersberger

See Johannes Weigend's solution (will be presented at Lucene/Solr Revolution)

adersberger avatar Oct 06 '16 18:10 adersberger