elasticsearch-analysis-dynamic-synonym
elasticsearch-analysis-dynamic-synonym copied to clipboard
使用同义词多次请求,有很大概率请求不到数据
问题描述
- 索引配置信息
{
"test": {
"aliases": {},
"mappings": {
"test": {
"properties": {
"text_1": {
"type": "string",
"analyzer": "synonym"
}
}
}
},
"settings": {
"index": {
"creation_date": "1482891562524",
"analysis": {
"filter": {
"remote_synonym": {
"type": "dynamic_synonym",
"synonyms_path": "http://IP:PORT/waf_file/files/sw",
"interval": "30"
}
},
"analyzer": {
"synonym": {
"filter": [
"remote_synonym"
],
"tokenizer": "ik"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "NMZ4fUryRXyoZ057lQrhDA",
"version": {
"created": "2030299"
}
}
},
"warmers": {}
}
}
- 创建一条数据
PUT /test/test/1?pretty=1
{
"text_1" : "水的密度很大"
}
- 使用如下语法查询数次
GET /test/_search
{
"query": {
"query_string": {
"default_field": "text_1",
"analyzer": "synonym",
"query": "density"
}
}
}
- 在文件中新增同义词:
密度, density - 查询语法
GET /test/_search
{
"query": {
"query_string": {
"default_field": "text_1",
"analyzer": "synonym",
"query": "density"
}
}
}
- 可以查到文档
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.16609077,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "15",
"_score": 0.16609077,
"_source": {
"text_1": "水的密度很大"
}
}
]
}
}
- 多次请求有很大概率无法检索到文档
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
做过如下尝试
- shard = 1, replia = 1,不会出现上述问题
- shard =5, replia = 1,单机两个
ES组成集群,问题依旧存在 - 重启
ES,不会出现上述问题
不知道问题出在哪里,需要大家的帮助
@xiaoheike ,先确定集群里面的每个节点都有安装该插件。
@bells 不好意思,问题里边我没有描述清楚。我做一些补充说明,该问题在单台机子(只有一个ES服务实例,shard=5, replia=1)时就会出现。因此我想这个和集群中的其他机子没有安转相同插件无关。不知道你还有什么建议?
单台机子 replia = 1 似乎没有什么意义,replia = 0 时会出现相同的情况吗?
@davidcai19840412 replia = 0 也是会出现该问题的。 这一周我一直在纠结这个问题,做了能够想到的实验,但是每次都失败了。不知道你有没有遇到过呢?难道是我的使用方法不对?
找到问题的原因了:
DynamicSynonymTokenFilterFactory.create() 方法存在并发,变量DynamicSynonymTokenFilterFactory.dynamicSynonymFilters 不支持并发添加,导致部分的DynamicSynonymFilter 对象没有保存到 dynamicSynonymFilters 中。
解决方案,修改两处:
private Map<DynamicSynonymFilter, Integer> dynamicSynonymFilters = new WeakHashMap()-->private List<DynamicSynonymFilter> dynamicSynonymFilters = Collections.synchronizedList(new ArrayList<DynamicSynonymFilter>());
public void run() {
if (synonymFile.isNeedReloadSynonymMap()) {
synonymMap = synonymFile.reloadSynonymMap();
for(DynamicSynonymFilter dynamicSynonymFilter : dynamicSynonymFilters) {
dynamicSynonymFilter.update(synonymMap);
logger.info("{} success reload synonym", indexName);
}
}
}
我尝试过如下修改:
private Map<DynamicSynonymFilter, Integer> dynamicSynonymFilters = new WeakHashMap()-->private Map<DynamicSynonymFilter, Integer> dynamicSynonymFilters = new ConcurrentHashMap<>();
但是在 create 方法调用时,有对象丢失,具体原因没有深究。
@bells 麻烦你验证修改哈
遇到同样的问题,同样的语句,查询返回的记录条数,总total数,差异比较大
i am having the same issue. after changing synonym.txt same search that before the change returned N results, after the change (and changing the query accordingly + waiting for synonym refresh) searches result in inconsistent responses. no hits, some expected hits, all expected hits.
UPDATE: I see this is fixed in the new version. i am using an older version for elasticsearch 5.1.1. took the fix from @xiaoheike 's pull request. Thanks!
@xiaoheike @bells 请问此问题在master中修掉了吗?
应该是处理了,之前也有小伙伴询问过这个问题,根据我的分支或者我在本页得修改方案修改代码试试。@xinlmain
为啥我多次请求同义词,会有分词结果不一样的情况?这是为什么呢?
其中“三次方”为自定义词 偶尔出现这种情况 { "tokens": [ { "token": "三", "start_offset": 0, "end_offset": 1, "type": "en", "position": 0 }, { "token": "次方", "start_offset": 1, "end_offset": 9, "type": "m", "position": 1 } ] } 想要这种情况。 { "tokens": [ { "token": "三次方", "start_offset": 0, "end_offset": 9, "type": "userDefine", "position": 0 } ] }