elasticsearch-carrot2
elasticsearch-carrot2 copied to clipboard
a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,
elasticsearch.carrot2
a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,
License Apache2
Version
master | 0.90.0 -> master 1.2.0 | 0.90.0 1.1.1 | 0.20.2
the demo page is here: http://s.medcl.net/?query=Search+API++Search+Type
a detailed tutorial is here: http://log.medcl.net/item/2013/06/tutorial-clustering-search-result-with-plugin-tools-carrot2/
1.download lexical files (https://github.com/downloads/medcl/elasticsearch-carrot2/config.zip) ,put them into the config folder.
2.bin/plugin install medcl/elasticsearch-carrot2/1.1.1
2.you download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf) https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/plugins/tools.carrot2
have fun.
curl -XPOST http://localhost:9200/elasticsearch_resources/_carrot2?carrot2.language=ENGLISH&carrot2.title_fields=title&carrot2.summary_fields=snippet&carrot2.url_field=url&carrot2.attach_detail=true&carrot2.cluster_count_base=10&carrot2.cluster_phrase_label_boost=2.0 -d' { "query": { "bool": { "should": [ { "match_all": {} } ] } }, "from": 0, "size": 500 } '
Response sample: https://gist.github.com/2184894
carrot2.language=ENGLISH [check appendix to view supported language] carrot2.title_fields [which filed in doc's source will be used as title for clustering] carrot2.summary_fields [which filed in doc's source will be used as summary for clustering] carrot2.url_field [which filed in doc's source will be used as url for clustering] carrot2.attach_hits=false [set false to decrease the size of response,will remove the original search hits] carrot2.attach_detail [set false to just return the id,title/summary/url will not included in response] carrot2.max_cluster_size=100 [the max num of clusters will be returned] carrot2.max_doc_per_cluster=10 [the max num of the docs within a cluster will be returned] carrot2.cluster_count_base=30 [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.desiredClusterCountBase] carrot2.cluster_phrase_label_boost=1.5 [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.phraseLabelBoost]
supported algorithm: LingoClusteringAlgorithm
TODO: STCClusteringAlgorithm BisectingKMeansClusteringAlgorithm ByFieldClusteringAlgorithm ByUrlClusteringAlgorithm
language: ARABIC, BULGARIAN, CZECH, CHINESE_SIMPLIFIED, DANISH, DUTCH, ENGLISH, ESTONIAN, FINNISH, FRENCH, GERMAN, GREEK, HUNGARIAN, ITALIAN, IRISH, KOREAN, LATVIAN, LITHUANIAN, MALTESE, NORWEGIAN, POLISH, PORTUGUESE, ROMANIAN, RUSSIAN, SLOVAK, SLOVENE, SPANISH, SWEDISH, THAI, TURKISH;