expected value but got [START_ARRAY]
Hi,
Running v1.5.1 I get the following error when attempting a research from within Nextcloud (indexing is fine, all conditions :heavy_check_mark:):
{"request":{"providers":["all"],"author":"admin","search":"test","page":1,"size":10,"parts":[],"queries":[],"options":{"files_local":"0","files_extension":""},"metatags":[],"subtags":[],"tags":[]},"version":"1.4.1","status":-1,"exception":"Elasticsearch\\Common\\Exceptions\\BadRequest400Exception","message":"{\"error\":{\"root_cause\":[{\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [should]\"}],\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [must]\",\"caused_by\":{\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [should]\",\"caused_by\":{\"type\":\"x_content_parse_exception\",\"reason\":\"[1:63] [bool] failed to parse field [should]\",\"caused_by\":{\"type\":\"illegal_state_exception\",\"reason\":\"expected value but got [START_ARRAY]\"}}}},\"status\":400}"}
It seems to be related to a recent change in Elasticsearch API with must_not composed queries: https://discuss.elastic.co/t/must-not-boolean-query-in-7-7/233269
The error mentions should and must, though.
Could it be caused by this line where OCP\FullTextSearch\Model\ISearchRequest $request would eventually be casted a double-nested array?
May be related to #105.
Here's what my simple "test" query looks like internally:
array(2) {
["must"]=>
array(1) {
["bool"]=>
array(1) {
["should"]=>
array(1) {
["bool"]=>
array(1) {
["should"]=>
array(1) {
[0]=>
array(1) {
[0]=>
array(1) {
["bool"]=>
array(1) {
["should"]=>
array(6) {
[0]=>
array(1) {
["match_phrase_prefix"]=>
array(1) {
["content"]=>
string(4) "test"
}
}
[1]=>
array(1) {
["match_phrase_prefix"]=>
array(1) {
["title"]=>
string(4) "test"
}
}
[2]=>
array(1) {
["match_phrase_prefix"]=>
array(1) {
["share_names.admin"]=>
string(4) "test"
}
}
[3]=>
array(1) {
["wildcard"]=>
array(1) {
["title"]=>
string(6) "*test*"
}
}
[4]=>
array(1) {
["wildcard"]=>
array(1) {
["share_names.admin"]=>
string(6) "*test*"
}
}
[5]=>
array(1) {
["query_string"]=>
array(2) {
["fields"]=>
array(1) {
[0]=>
string(14) "parts.comments"
}
["query"]=>
string(4) "test"
}
}
}
}
}
}
}
}
}
}
}
["filter"]=>
array(5) {
[0]=>
array(1) {
["bool"]=>
array(1) {
["must"]=>
array(1) {
["term"]=>
array(1) {
["provider"]=>
string(5) "files"
}
}
}
}
[1]=>
array(1) {
["bool"]=>
array(1) {
["should"]=>
array(6) {
[0]=>
array(1) {
["term"]=>
array(1) {
["owner"]=>
string(5) "admin"
}
}
[1]=>
array(1) {
["term"]=>
array(1) {
["users"]=>
string(5) "admin"
}
}
[2]=>
array(1) {
["term"]=>
array(1) {
["users"]=>
string(5) "__all"
}
}
[3]=>
array(1) {
["term"]=>
array(1) {
["groups"]=>
string(4) "Nous"
}
}
[4]=>
array(1) {
["term"]=>
array(1) {
["groups"]=>
string(5) "admin"
}
}
[5]=>
array(1) {
["term"]=>
array(1) {
["circles"]=>
string(14) "99a1994e6f67f8"
}
}
}
}
}
[2]=>
array(1) {
["bool"]=>
array(1) {
["should"]=>
array(0) {
}
}
}
[3]=>
array(1) {
["bool"]=>
array(1) {
["must"]=>
array(0) {
}
}
}
[4]=>
array(1) {
["bool"]=>
array(1) {
["must"]=>
array(0) {
}
}
}
}
}
I have no clue why it's so deeply nested :see_no_evil: but I noticed this chunk:
["should"]=>
array(1) {
[0]=>
array(1) {
Ok, that's fixed :+1:
/**
* @param ISearchRequest $request
* @param QueryContent[] $queryContents
*
* @return array
*/
private function generateSearchQueryFromQueryContent(
ISearchRequest $request, array $queryContents
): array {
$query = $queryWords = [];
foreach ($queryContents as $queryContent) {
- $queryWords[$queryContent->getShould()][] =
+ $queryWords[$queryContent->getShould()] =
$this->generateQueryContentFields($request, $queryContent);
}
$listShould = array_keys($queryWords);
foreach ($listShould as $itemShould) {
- $query[$itemShould][] = $queryWords[$itemShould];
+ $query[$itemShould] = $queryWords[$itemShould];
}
return ['bool' => $query];
}
Not sure about how to come up with a PR though, as it needs to be retro-compatible. And I don't know exactly which Elasticsearch version introduced the change.
@daita you seem to be maintaining this repo up to date, matching upstream Elasticsearch releases.
Could you give me some insights about how to write a PR for this fix? I don't know whether it should/could be applied without second-thoughts, or would retro-compatibility with ES < v7.x be an issue?
Not sure about how to come up with a PR though, as it needs to be retro-compatible. And I don't know exactly which Elasticsearch version introduced the change.
I can only assume that the the issue is raised with Elasticsearch version 7.7.0. I keep my Ubuntu 18.04.4 LTS server alltimes up-to-date and it has worked fine with the previously used Elasticsearch version.
I can only assume that the the issue is raised with Elasticsearch version 7.7.0.
I encountered the same problem when setting up a new system with version 7.7.0. I then tried again with 7.6.2 and that one works fine.
Can confirm this issue with the update to 7.7.0, worked fine with 7.6.2
Ok, I'll dive into the changelog and commits again, trying to isolate the actual change, just to be sure. Then I can come up with a PR constrainted by a version flag.
I'm surprised, that the old JSON query worked at all, seem they have removed some fallback-code with the latest release. The patch provided in a above worked only as long as only a single word or phrase was queried; it returned the match for the last phrase/word in the query string.
My proposed change should also work with older version of EL, please test.