SemanticSearchInNumpy icon indicating copy to clipboard operation
SemanticSearchInNumpy copied to clipboard

#Simple example

##Index documents into Solr

In the StackExchangeSolrIndexing folder:

  • Download a StackExchange dump of your choosing: http://www.clearbits.net/torrents/2076-aug-2012 (or the start immediatly with the posts.xml.gz file)
  • Unzip the data set you're interested in (7-zip format) Or gunzip posts.xml.gz
  • Download Solr: http://lucene.apache.org/solr/
  • Start Solr:
cd apache-solr-x.x.x/example
java -jar -Dsolr.solr.home=<full_path_to_this_dir>/solr_home start.jar
  • Index documents (currently only works for posts):
python extractDocs.py "<full_path_to_stack_exchange_dump>/posts.xml"

(Configuration details can be found in solr_home/collection1/conf/schema.xml)

  • Search!
localhost:8983/solr/collection1/select?q=Tags:star-wars

##Auto-Generate Synonyms

In the SemanticExtraction folder

  • Run
python SemanticAnalyzer.py Body
  • Look at Solr logging to see how many documents have been processed
  • Do some searches! http://localhost:8983/solr/select?q=Tags:harry-potter&fq=Body:*&fl=Body%20BodyBlurred%20Tags%20Id&facet=true&facet.field=Tags