fulltextsearch_elasticsearch icon indicating copy to clipboard operation
fulltextsearch_elasticsearch copied to clipboard

expected value but got [START_ARRAY]

Open chikamichi opened this issue 5 years ago • 10 comments

Hi,

Running v1.5.1 I get the following error when attempting a research from within Nextcloud (indexing is fine, all conditions :heavy_check_mark:):

{"request":{"providers":["all"],"author":"admin","search":"test","page":1,"size":10,"parts":[],"queries":[],"options":{"files_local":"0","files_extension":""},"metatags":[],"subtags":[],"tags":[]},"version":"1.4.1","status":-1,"exception":"Elasticsearch\\Common\\Exceptions\\BadRequest400Exception","message":"{\"error\":{\"root_cause\":[{\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [should]\"}],\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [must]\",\"caused_by\":{\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [should]\",\"caused_by\":{\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [should]\",\"caused_by\":{\"type\":\"illegal_state_exception\",\"reason\":\"expected value but got [START_ARRAY]\"}}}},\"status\":400}"}

It seems to be related to a recent change in Elasticsearch API with must_not composed queries: https://discuss.elastic.co/t/must-not-boolean-query-in-7-7/233269

The error mentions should and must, though.

Could it be caused by this line where OCP\FullTextSearch\Model\ISearchRequest $request would eventually be casted a double-nested array?

chikamichi avatar May 19 '20 23:05 chikamichi

May be related to #105.

chikamichi avatar May 19 '20 23:05 chikamichi

Here's what my simple "test" query looks like internally:

array(2) {
  ["must"]=>
  array(1) {
    ["bool"]=>
    array(1) {
      ["should"]=>
      array(1) {
        ["bool"]=>
        array(1) {
          ["should"]=>
          array(1) {
            [0]=>
            array(1) {
              [0]=>
              array(1) {
                ["bool"]=>
                array(1) {
                  ["should"]=>
                  array(6) {
                    [0]=>
                    array(1) {
                      ["match_phrase_prefix"]=>
                      array(1) {
                        ["content"]=>
                        string(4) "test"
                      }
                    }
                    [1]=>
                    array(1) {
                      ["match_phrase_prefix"]=>
                      array(1) {
                        ["title"]=>
                        string(4) "test"
                      }
                    }
                    [2]=>
                    array(1) {
                      ["match_phrase_prefix"]=>
                      array(1) {
                        ["share_names.admin"]=>
                        string(4) "test"
                      }
                    }
                    [3]=>
                    array(1) {
                      ["wildcard"]=>
                      array(1) {
                        ["title"]=>
                        string(6) "*test*"
                      }
                    }
                    [4]=>
                    array(1) {
                      ["wildcard"]=>
                      array(1) {
                        ["share_names.admin"]=>
                        string(6) "*test*"
                      }
                    }
                    [5]=>
                    array(1) {
                      ["query_string"]=>
                      array(2) {
                        ["fields"]=>
                        array(1) {
                          [0]=>
                          string(14) "parts.comments"
                        }
                        ["query"]=>
                        string(4) "test"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  ["filter"]=>
  array(5) {
    [0]=>
    array(1) {
      ["bool"]=>
      array(1) {
        ["must"]=>
        array(1) {
          ["term"]=>
          array(1) {
            ["provider"]=>
            string(5) "files"
          }
        }
      }
    }
    [1]=>
    array(1) {
      ["bool"]=>
      array(1) {
        ["should"]=>
        array(6) {
          [0]=>
          array(1) {
            ["term"]=>
            array(1) {
              ["owner"]=>
              string(5) "admin"
            }
          }
          [1]=>
          array(1) {
            ["term"]=>
            array(1) {
              ["users"]=>
              string(5) "admin"
            }
          }
          [2]=>
          array(1) {
            ["term"]=>
            array(1) {
              ["users"]=>
              string(5) "__all"
            }
          }
          [3]=>
          array(1) {
            ["term"]=>
            array(1) {
              ["groups"]=>
              string(4) "Nous"
            }
          }
          [4]=>
          array(1) {
            ["term"]=>
            array(1) {
              ["groups"]=>
              string(5) "admin"
            }
          }
          [5]=>
          array(1) {
            ["term"]=>
            array(1) {
              ["circles"]=>
              string(14) "99a1994e6f67f8"
            }
          }
        }
      }
    }
    [2]=>
    array(1) {
      ["bool"]=>
      array(1) {
        ["should"]=>
        array(0) {
        }
      }
    }
    [3]=>
    array(1) {
      ["bool"]=>
      array(1) {
        ["must"]=>
        array(0) {
        }
      }
    }
    [4]=>
    array(1) {
      ["bool"]=>
      array(1) {
        ["must"]=>
        array(0) {
        }
      }
    }
  }
}

chikamichi avatar May 20 '20 00:05 chikamichi

I have no clue why it's so deeply nested :see_no_evil: but I noticed this chunk:

        ["should"]=>
          array(1) {
            [0]=>
            array(1) {

chikamichi avatar May 20 '20 00:05 chikamichi

Ok, that's fixed :+1:

        /**
	 * @param ISearchRequest $request
	 * @param QueryContent[] $queryContents
	 *
	 * @return array
	 */
	private function generateSearchQueryFromQueryContent(
		ISearchRequest $request, array $queryContents
	): array {

		$query = $queryWords = [];
		foreach ($queryContents as $queryContent) {
-			$queryWords[$queryContent->getShould()][] =
+			$queryWords[$queryContent->getShould()] =
				$this->generateQueryContentFields($request, $queryContent);
		}

		$listShould = array_keys($queryWords);
		foreach ($listShould as $itemShould) {
-			$query[$itemShould][] = $queryWords[$itemShould];
+			$query[$itemShould] = $queryWords[$itemShould];
		}

		return ['bool' => $query];
	}

Not sure about how to come up with a PR though, as it needs to be retro-compatible. And I don't know exactly which Elasticsearch version introduced the change.

chikamichi avatar May 20 '20 00:05 chikamichi

@daita you seem to be maintaining this repo up to date, matching upstream Elasticsearch releases.

Could you give me some insights about how to write a PR for this fix? I don't know whether it should/could be applied without second-thoughts, or would retro-compatibility with ES < v7.x be an issue?

chikamichi avatar May 21 '20 00:05 chikamichi

Not sure about how to come up with a PR though, as it needs to be retro-compatible. And I don't know exactly which Elasticsearch version introduced the change.

I can only assume that the the issue is raised with Elasticsearch version 7.7.0. I keep my Ubuntu 18.04.4 LTS server alltimes up-to-date and it has worked fine with the previously used Elasticsearch version.

WHRoeder avatar May 21 '20 10:05 WHRoeder

I can only assume that the the issue is raised with Elasticsearch version 7.7.0.

I encountered the same problem when setting up a new system with version 7.7.0. I then tried again with 7.6.2 and that one works fine.

fzieris avatar May 22 '20 16:05 fzieris

Can confirm this issue with the update to 7.7.0, worked fine with 7.6.2

thomasgg23 avatar May 23 '20 21:05 thomasgg23

Ok, I'll dive into the changelog and commits again, trying to isolate the actual change, just to be sure. Then I can come up with a PR constrainted by a version flag.

chikamichi avatar May 24 '20 09:05 chikamichi

I'm surprised, that the old JSON query worked at all, seem they have removed some fallback-code with the latest release. The patch provided in a above worked only as long as only a single word or phrase was queried; it returned the match for the last phrase/word in the query string.

My proposed change should also work with older version of EL, please test.

weberhofer avatar May 26 '20 11:05 weberhofer