elasticsearch-analysis-ansj icon indicating copy to clipboard operation
elasticsearch-analysis-ansj copied to clipboard

如果字段只索引不保存,覆盖的时候回报错

Open buaanie opened this issue 7 years ago • 4 comments

[2017-08-14 20:41:02,052][WARN ][cluster.action.shard     ] [bd34] [nnewsindex][2] received shard failed for target shard [[nnewsindex][2], node[XrPm_SElSz2V262ysBmmAA], relocating [I0inuDUeR_OXHL4auxvQKQ], [R], v[10718], s[INITIALIZING], a[id=z8ta7k7HSP2nECAdZ2QClg, rId=tDL5rPFzS5KtSSztdBWW5w], expected_shard_size[410490]], indexUUID [VeqS8xejSAyfHVOUibm6Gw], message [failed to update mappings], failure [MapperParsingException[analyzer [index_ansj] not found for field [content]]]
MapperParsingException[analyzer [index_ansj] not found for field [content]]
	at org.elasticsearch.index.mapper.core.TypeParsers.parseAnalyzersAndTermVectors(TypeParsers.java:213)
	at org.elasticsearch.index.mapper.core.TypeParsers.parseTextField(TypeParsers.java:250)
	at org.elasticsearch.index.mapper.core.StringFieldMapper$TypeParser.parse(StringFieldMapper.java:161)
	at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:305)
	at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:218)
	at org.elasticsearch.index.mapper.object.RootObjectMapper$TypeParser.parse(RootObjectMapper.java:139)
	at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:118)
	at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:99)
	at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:498)
	at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:288)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.processMapping(IndicesClusterStateService.java:387)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyMappings(IndicesClusterStateService.java:348)
	at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:164)
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

最开始插入索引时候没有判断是否已插入,其中content字段是只索引没有保存(exclude掉了),这样如果重复索引的时候就会报这个错,不知道是不是一个bug?

buaanie avatar Aug 14 '17 12:08 buaanie

哪个版本出的问题?是集群还是单节点?配置文件呢?

shi-yuan avatar Aug 14 '17 13:08 shi-yuan

@shi-yuan 自己编译的5.1.1版ansj,用于2.3.1版es。仔细确认了一下不是上面描述的问题,而是因为es没找到index_ansj。是在elasticsearch.yml文件中没有指定index_ansj,然后在index的setting中添加了自定义的两个analyzer,但是在mapping中用到了自定义的两个和ansj的,不知道是不是因为在setting中没有声明,导致找不到ansj的analyzer。

setting:
                "    \"index\": {" +
                "        \"analysis\": {" +
                "            \"analyzer\": {" +
                "                \"word_analyzer\": {" +
                "                    \"type\": \"custom\"," +
                "                    \"tokenizer\": \"word_tokenizer\"" +
                "                }," +
                "                \"id_analyzer\": {" +
                "                    \"type\": \"custom\"," +
                "                    \"tokenizer\": \"id_tokenizer\"" +
                "                }," +
//最开始觉得index和query两个analyzer已经向es注册,无需再指明
//所以在setting中没有写index_ansj和query_index这两个analyzer。导致找不到分词器
                "                \"index_ansj\": {" +
                "                    \"type\": \"index_ansj\"," +
                "                }," +
                "                \"query_ansj\": {" +
                "                    \"type\": \"query_ansj\"," +
                "                }" +
                "            }," +
                "            \"tokenizer\": {" +
                "                \"word_tokenizer\": {" +
                "                    \"pattern\": \"\\\\s|,|,\"," +
                "                    \"type\": \"pattern\"" +
                "                }," +
                "                \"id_tokenizer\": {" +
                "                    \"pattern\": \"-\"," +
                "                    \"type\": \"pattern\"" +
                "                }" +
                "            }" +
                "        }" +
                "    }" +
                "}

我想的是 是不是如果自己在setting中指明了自定义的analyzer,那么他的优先级最高,导致找不到其实已经向es注册、但我没有写进自己setting中的其他analyzer。

.startObject("content").field("type", "string").field("index", "analyzed").field("analyzer", "index_ansj").field("search_analyzer","query_ansj").endObject()
                    .startObject("newsid").field("type", "string").field("index", "analyzed").field("analyzer", "id_analyzer").endObject()

buaanie avatar Aug 15 '17 11:08 buaanie

另外还有一个问题,就是因为你的2.3.1版es_ansj插件采用的是3.X版本的ansj_seg 存在数组越界错误,所以我自己编译的5.1.1版本用在2.3.1es_ansj上,但是5.1版本ansj好像不会读取StopLibrary,即使在library.properties 配置了,停用词典也不会生效。是我配置错了嘛

buaanie avatar Aug 16 '17 00:08 buaanie

应该是配置的有问题,是怎么配置的?

shi-yuan avatar Aug 16 '17 04:08 shi-yuan