mattweber/elasticsearch-mocksolrplugin: Use Solr clients/tools with ElasticSearch

trafficstars

h1. ElasticSearch Mock Solr Plugin

|.Mock Solr Plugin|.elasticsearch|_.Lucene/Solr| |master|0.20.2 -> 0.20.X|3.6.2| |1.1.4|0.20.2 -> 0.20.X|3.6.2| |1.1.3|0.19.3 -> 0.20.1|3.6.0| |1.1.2|0.19.0 -> 0.19.2|3.5.0| |1.1.1|0.18.6 -> 0.18.7|3.5.0| |1.1.0|0.18.0 -> 0.18.5|3.5.0|

h2. Use Solr clients/tools with ElasticSearch

This plugin will allow you to use tools that were built to interact with Solr with ElasticSearch.

The idea for this plugin came when I wanted to use Nutch with ElasticSearch. Instead of extending Nutch itself, I thought it would be nice to use any Solr clients with ElasticSearch. Some projects we can now use are Nutch, Apache ManifoldCF, and any tool using SolrJ. It should be possible to use non-java tools that write to Solr using the XML update and request handlers as well.

h3. Supported Solr features

Update handlers ** XML Update Handler (ie. /update) ** JavaBin Update Handler (ie. /update/javabin)
Search handler (ie. /select) ** Basic lucene queries using the q paramter ** start, rows, and fl parameters ** sorting ** filter queries (fq parameters) ** hit highlighting (hl, hl.fl, hl.snippets, hl.fragsize, hl.simple.pre, hl.simple.post) ** faceting (facet, facet.field, facet.query, facet.sort, facet.limit)
XML and JavaBin request and response formats

h3. How do you build this plugin?

Use maven to build the package

mvn package

Then install the plugin

# if you've built it locally
$ES_HOME/bin/plugin -url file:./target/releases/elasticsearch-mocksolrplugin-*.zip -install mocksolrplugin

h3. How to use this plugin.

Just point your Solr client/tool to your ElasticSearch instance and appending /_solr to the url.

http://localhost:9200/${index}/${type}/_solr

${index} - the ES index you want to index/search against. Default "solr". ${type} - the ES type you want to index/search against. Default "docs".

Example paths:

// Will search/index against index "solr" and type "docs"
http://localhost:9200/_solr

// Will search/index against index "testindex" and type "docs"
http://localhost:9200/testindex/_solr

// Will search/index against index "testindex" and type "testtype"
http://localhost:9200/testindex/testtype/_solr

Use the client/tool as you would with Solr.

h3. Example SolrJ Indexing

    CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:9200/testindex/testtype/_solr");
    server.setRequestWriter(new BinaryRequestWriter());
    // we support both xml and SolrBin response writers
    //server.setParser(new XMLResponseParser());
    
    SolrInputDocument doc1 = new SolrInputDocument();
    doc1.addField( "id", "id1", 1.0f );
    doc1.addField( "name", "doc1", 1.0f );
    doc1.addField( "price", 10 );

    SolrInputDocument doc2 = new SolrInputDocument();
    doc2.addField( "id", "id2", 1.0f );
    doc2.addField( "name", "doc2", 1.0f );
    doc2.addField( "price", 20 );
    
    Collection docs = new ArrayList();
    docs.add( doc1 );
    docs.add( doc2 );
    
    server.add( docs );
    server.commit();

    // deletes work as well
    //server.deleteById("id2");
    //server.commit();

Perform a search and verify the documents were indexed.

h3. Example SolrJ Searching

    CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:9200/testindex/testtype/_solr");

    String qstr = "id:[* TO *]";
    SolrQuery query = new SolrQuery();
    query.setQuery(qstr);

    QueryResponse response = server.query(query);
    for (SolrDocument doc : response.getResults()) {
        for (String field : doc.getFieldNames()) {
            System.out.println(field + " = " + doc.getFieldValue(field));
        }
        System.out.println();
    }

h3. Example using Nutch

At a minimum, use the following type mapping for ElasticSearch.

curl -XPUT 'http://localhost:9200/testindex'
curl -XPUT 'http://localhost:9200/testindex/testtype/_mapping' -d '{
    "testtype" : {
        "properties" : {
            "id" : {
                "type" : "string",
                "store": "yes"
            },
            "digest" : {
                "type" : "string",
                "store" : "yes",
                "index" : "no"
            },
            "boost" : {
                "type" : "float",
                "store" : "yes",
                "index" : "no"
            },
            "tstamp" : {
                "type" : "date",
                "store" : "yes",
                "index" : "no"
            }
        }
    }
}'

Follow the nutch tutorial at http://wiki.apache.org/nutch/NutchTutorial

Follow steps 1 though 3.1
For step 3.1 use:

bin/nutch crawl urls -solr http://localhost:9200/testindex/testtype/_solr -depth 3 -topN 5

h3. Notes

ElasticSearch does not require a schema and all the data you send to Solr will be indexed by default. You Can use the ElasticSearch PUT Mapping API to define your field types, what should be stored, analyzed, etc. All data that is indexed via the mock XML Update Handler will most likely be detected by ElasticSearch as strings, thus it is a good idea to mimic your Solr schema with an ElasticSearch type mapping.

elasticsearch-mocksolrplugin
elasticsearch-mocksolrplugin copied to clipboard

Metadata

← Metadata

Owner

Metadata

elasticsearch-mocksolrplugin elasticsearch-mocksolrplugin copied to clipboard

Metadata

← Metadata

Owner

Metadata

elasticsearch-mocksolrplugin
elasticsearch-mocksolrplugin copied to clipboard