elasticsearch-minhash
elasticsearch-minhash copied to clipboard
how can I perform search query against minhash field?
Is there any chance I can perform search query against minhash field to find similar documents?
I create minhash analyzer, add mapping to store minhash with the following code:
PUT /my_index
{
"index":{
"analysis":{
"analyzer":{
"minhash_analyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":["minhash"]
}
}
}
}
}
PUT /my_index/_doc/_mapping
{
"_doc":{
"properties":{
"message":{
"type":"text",
"copy_to":"minhash_value"
},
"minhash_value":{
"type":"minhash",
"minhash_analyzer":"minhash_analyzer"
}
}
}
}
PUT /my_index/_doc/1
{
"message":"Sample text"
}
GET /my_index/_doc/1?pretty&stored_fields=minhash_value,_source
Here I can see that the "minhash_value" is properly calculated and stored.
I am trying to query similar documents using this advice
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["minhash_value"],
"like" : "7MCNkXlsr8O9pYZs6eSnig==",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
got the following error
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 1,
"skipped" : 0,
"failed" : 5,
"failures" : [
{
"shard" : 0,
"index" : "my_index",
"node" : "T25wlUSYSUimAg8ppcStGw",
"reason" : {
"type" : "query_shard_exception",
"reason" : """
failed to create query: {
"more_like_this" : {
"fields" : [
"minhash_value"
],
"like" : [
"KV5rsUfZpcZdVojpG8mHLA=="
],
"max_query_terms" : 12,
"min_term_freq" : 1,
"min_doc_freq" : 5,
"max_doc_freq" : 2147483647,
"min_word_length" : 0,
"max_word_length" : 0,
"minimum_should_match" : "30%",
"boost_terms" : 0.0,
"include" : false,
"fail_on_unsupported_field" : true,
"boost" : 1.0
}
}
""",
"index_uuid" : "GX2WM-oMTUqQ3hgKZPk28Q",
"index" : "my_index",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "more_like_this only supports text/keyword fields: [minhash_value]"
}
}
}
]
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
any advice how to perform search?
elasticsearch version: 6.5.1 plugin version: 6.5.0
I've tried the same with es version 5.6.14 and 5.6.1 plugin version but got the same error
In Fess, Field Collapsing is used.
Could you please elaborate?
I couldn't resolve this issue either, how is one supposed to query by min_hash value?