elasticsearch
elasticsearch copied to clipboard
elasticsearch analysis ansj分词器的安装及使用
elasticsearch-analysis-ansj 主页
https://github.com/4onni/elasticsearch-analysis-ansj
ansj 分词器
https://github.com/NLPchina/ansj_seg
插件编译安装
1. 克隆源码
2. 修改pom文件配置
<elasticsearch.version>1.7.1</elasticsearch.version>
<dependency>
<groupId>org.ansj</groupId>
<artifactId>ansj_seg</artifactId>
<classifier>min</classifier>
<version>2.0.8</version>
<scope>compile</scope>
</dependency>
3.编译插件
mvn assembly:assembly
4. 插件安装
elasticsearch-1.7.1\bin>plugin -u file:///C:\Users\Administrator\Desktop\elasticsearch-analysis-ansj\target\releases\elasticsearch-analysis-ansj-1.x.1-release.zip -i ansj
5. 配置ansj分词器
index:
analysis:
analyzer:
index_ansj:
type: ansj_index
query_ansj:
type: ansj_query
ik:
alias: [news_analyzer_ik,ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
mmseg:
alias: [news_analyzer, mmseg_analyzer]
type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider
index.analysis.analyzer.default.type : "ansj_index"
详细配置可参考elasticsearch.yml.example
6. 测试及使用
- 索引分词
http://127.0.0.1:9200/articles/_analyze?analyzer=ansj_index&text=我们是中国人
注:其中articles是索引名称,除articles外的所有请求url参数部分均为固定写法。analyzer=ansj_index指定索引分词器,text后为要索引的内容
输出:
{
"tokens": [
{
"token": "我们",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "是",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "中国",
"start_offset": 3,
"end_offset": 5,
"type": "word",
"position": 3
},
{
"token": "人",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 4
}
]
}
- 查询分词
http://127.0.0.1:9200/articles/_analyze?analyzer=ansj_query&text=我们是中国人
注:其中articles是索引名称,除articles外的所有请求url参数部分均为固定写法。analyzer=ansj_query指定查询分词器,text后为要查询的内容
输出:
{
"tokens": [
{
"token": "我们",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "是",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "中国",
"start_offset": 3,
"end_offset": 5,
"type": "word",
"position": 3
},
{
"token": "人",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 4
}
]
}
你好,怎么感觉索引分词和查询分词没什么区别?结果是一模一样的!