zombodb icon indicating copy to clipboard operation
zombodb copied to clipboard

Highlight function confuses boundary_max_scan and boundary_scan_max

Open emmerg opened this issue 3 years ago • 1 comments

ZomboDB version: tag v3000.0.3 Postgres version: 13.4 Elasticsearch version: 7.15.1

Problem Description:

Use of the highlighting function to set the boundary max characters causes an error when trying to set a boundary scan max, potentially due to a confusion between boundary_max_scan (https://www.elastic.co/guide/en/elasticsearch/reference/current/highlighting.html#highlighting-settings) and boundary_scan_max (https://github.com/zombodb/zombodb/blob/master/SCORING-HIGHLIGHTING.md?plain=1#L100)

Error Message (if any):

tutorial=# SELECT zdb.highlight(ctid, 'long_description') from products where products ==> 'wooden or person';
                                            highlight
--------------------------------------------------------------------------------------------------
 {"Throw it at a <em>person</em> with a big <em>wooden</em> stick and hope they don't hit it"}
 {"A <em>wooden</em> container that will eventually rot away.  Put stuff it in (but not a cat)."}
(2 rows)

tutorial=# SELECT zdb.highlight(ctid, 'long_description', zdb.highlight(boundary_max_scan=>20)) from products where products ==> 'wooden or person';
ERROR:  function zdb.highlight(boundary_max_scan => integer) does not exist
LINE 1: SELECT zdb.highlight(ctid, 'long_description', zdb.highlight...
                                                       ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
tutorial=# SELECT zdb.highlight(ctid, 'long_description', zdb.highlight(boundary_scan_max=>20)) from products where products ==> 'wooden or person';
ERROR:  HTTP 400 {
  "error": {
    "root_cause": [
      {
        "type": "x_content_parse_exception",
        "reason": "[1:446] [highlight_field] unknown field [boundary_scan_max] did you mean any of [boundary_scanner, boundary_scanner_locale, boundary_max_scan, boundary_chars]?"
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[1:466] [highlight] failed to parse field [fields]",
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[1:466] [fields] failed to parse field [long_description]",
      "caused_by": {
        "type": "x_content_parse_exception",
        "reason": "[1:446] [highlight_field] unknown field [boundary_scan_max] did you mean any of [boundary_scanner, boundary_scanner_locale, boundary_max_scan, boundary_chars]?"
      }
    }
  },
  "status": 400
}
CONTEXT:  /root/.cargo/registry/src/github.com-1ecc6299db9ec823/pgx-pg-sys-0.1.21/src/pg13.rs:43160:1

Table Schema/Index Definition:

db configured as described in (https://github.com/zombodb/zombodb/blob/master/TUTORIAL.md)

Output from select zdb.index_mapping('index_name');:


{"43536.2200.44267.44279": {"mappings": {"properties": {"id": {"type": "long"}, "name": {"type": "text", "copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fiel
ddata": true, "index_prefixes": {"max_chars": 5, "min_chars": 2}}, "price": {"type": "long"}, "zdb_all": {"type": "text", "analyzer": "zdb_all_analyzer"}, "keyword
s": {"type": "keyword", "copy_to": ["zdb_all"], "normalizer": "lowercase", "ignore_above": 10922}, "zdb_cmax": {"type": "integer"}, "zdb_cmin": {"type": "integer"}
, "zdb_ctid": {"type": "long"}, "zdb_xmax": {"type": "long"}, "zdb_xmin": {"type": "long"}, "discontinued": {"type": "boolean"}, "short_summary": {"type": "text",
"copy_to": ["zdb_all"], "analyzer": "zdb_standard", "fielddata": true, "index_prefixes": {"max_chars": 5, "min_chars": 2}}, "inventory_count": {"type": "integer"},
 "long_description": {"type": "text", "analyzer": "fulltext", "fielddata": true}, "zdb_aborted_xids": {"type": "long"}, "availability_date": {"type": "keyword", "f
ields": {"date": {"type": "date"}}, "copy_to": ["zdb_all"]}}, "date_detection": false, "dynamic_templates": [{"strings": {"mapping": {"type": "keyword", "copy_to":
 "zdb_all", "normalizer": "lowercase", "ignore_above": 10922}, "match_mapping_type": "string"}}, {"dates_times": {"mapping": {"type": "keyword", "fields": {"date":
 {"type": "date", "format": "strict_date_optional_time||epoch_millis||HH:mm:ss.S||HH:mm:ss.SX||HH:mm:ss.SS||HH:mm:ss.SSX||HH:mm:ss.SSS||HH:mm:ss.SSSX||HH:mm:ss.SSS
S||HH:mm:ss.SSSSX||HH:mm:ss.SSSSS||HH:mm:ss.SSSSSX||HH:mm:ss.SSSSSS||HH:mm:ss.SSSSSSX"}}, "copy_to": "zdb_all"}, "match_mapping_type": "date"}}, {"objects": {"mapp
ing": {"type": "nested", "include_in_parent": true}, "match_mapping_type": "object"}}], "numeric_detection": false}}}

Other Discussion:

emmerg avatar Nov 10 '21 20:11 emmerg

I realize this issue is nearly a year old now, but is the complaint here simply that we accidentally spelled boundary_max_scan (what ES actually wants) as boundary_scan_max?

If so, I'll fix it.

eeeebbbbrrrr avatar Oct 13 '22 16:10 eeeebbbbrrrr