elasticsearch-minhash icon indicating copy to clipboard operation
elasticsearch-minhash copied to clipboard

how can I perform search query against minhash field?

Open sladonia opened this issue 6 years ago • 3 comments

Is there any chance I can perform search query against minhash field to find similar documents?

I create minhash analyzer, add mapping to store minhash with the following code:

PUT /my_index
{
  "index":{
    "analysis":{
      "analyzer":{
        "minhash_analyzer":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":["minhash"]
        }
      }
    }
  }
}

PUT /my_index/_doc/_mapping
{
  "_doc":{
    "properties":{
      "message":{
        "type":"text",
        "copy_to":"minhash_value"
      },
      "minhash_value":{
        "type":"minhash",
        "minhash_analyzer":"minhash_analyzer"
      }
    }
  }
}

PUT /my_index/_doc/1
{
  "message":"Sample text"
}

GET /my_index/_doc/1?pretty&stored_fields=minhash_value,_source

Here I can see that the "minhash_value" is properly calculated and stored.

I am trying to query similar documents using this advice

GET /_search
{
    "query": {
        "more_like_this" : {
            "fields" : ["minhash_value"],
            "like" : "7MCNkXlsr8O9pYZs6eSnig==",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}

got the following error

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 6,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 5,
    "failures" : [
      {
        "shard" : 0,
        "index" : "my_index",
        "node" : "T25wlUSYSUimAg8ppcStGw",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : """
failed to create query: {
  "more_like_this" : {
    "fields" : [
      "minhash_value"
    ],
    "like" : [
      "KV5rsUfZpcZdVojpG8mHLA=="
    ],
    "max_query_terms" : 12,
    "min_term_freq" : 1,
    "min_doc_freq" : 5,
    "max_doc_freq" : 2147483647,
    "min_word_length" : 0,
    "max_word_length" : 0,
    "minimum_should_match" : "30%",
    "boost_terms" : 0.0,
    "include" : false,
    "fail_on_unsupported_field" : true,
    "boost" : 1.0
  }
}
""",
          "index_uuid" : "GX2WM-oMTUqQ3hgKZPk28Q",
          "index" : "my_index",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "more_like_this only supports text/keyword fields: [minhash_value]"
          }
        }
      }
    ]
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

any advice how to perform search?

elasticsearch version: 6.5.1 plugin version: 6.5.0

I've tried the same with es version 5.6.14 and 5.6.1 plugin version but got the same error

sladonia avatar Jan 08 '19 09:01 sladonia

In Fess, Field Collapsing is used.

marevol avatar Jan 10 '19 06:01 marevol

Could you please elaborate?

marawanokasha avatar Feb 26 '19 19:02 marawanokasha

I couldn't resolve this issue either, how is one supposed to query by min_hash value?

Fred12 avatar Mar 14 '19 13:03 Fred12