Possible BUG: HTTP (JSON) offset and limit affects facet results
Describe the bug
Using the HTTP (JSON) interface, the request limit and offset (doc) affects the 'aggregations' (facet) results.
SQL interface behaves as expected.
To Reproduce
Create and fill the index
CREATE TABLE la8_test (
title text indexed,
description text indexed,
str_val string,
int_val int,
bigint_val bigint,
float_val float,
bool_val bool,
time_val timestamp,
json_val json,
mva_int_val multi
);
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (1, 'A great title 1', 'A long description 1', 'tagone', 11, 10000001, 1001.5, 1, 1634549851, '{\"a\":1,\"b\":2}', (3,5,7,9))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (2, 'A great title 2', 'A long description 2', 'tagone', 12, 10000002, 1002.5, 1, 1634549850, '{\"a\":2,\"b\":3}', (4,6,8,10))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (3, 'A great title 3', 'A long description 3', 'tagone', 13, 10000003, 1003.5, 1, 1634549849, '{\"a\":3,\"b\":4}', (5,7,9,11))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (4, 'A great title 4', 'A long description 4', 'tagtwo', 14, 10000004, 1004.5, 1, 1634549848, '{\"a\":4,\"b\":5}', (6,8,10,12))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (5, 'A great title 5', 'A long description 5', 'tagtwo', 15, 10000005, 1005.5, 1, 1634549848, '{\"a\":5,\"b\":6}', (7,9,11,13))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (6, 'A great title 6', 'A long description 6', 'tagtwo', 16, 10000006, 1006.5, 1, 1634549847, '{\"a\":6,\"b\":7}', (8,10,12,14))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (7, 'A great title 7', 'A long description 7', 'tagtwo', 17, 10000007, 1007.5, 1, 1634549846, '{\"a\":7,\"b\":8}', (9,11,13,15))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (8, 'A great title 8', 'A long description 8', 'tagtwo', 18, 10000008, 1008.5, 1, 1634549845, '{\"a\":8,\"b\":9}', (10,12,14,16))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (9, 'A great title 9', 'A long description 9', 'tagtwo', 19, 10000009, 1009.5, 1, 1634549844, '{\"a\":9,\"b\":10}', (11,13,15,17))
REPLACE INTO `la8_test` (id, title, description, str_val, int_val, bigint_val, float_val, bool_val, time_val, json_val, mva_int_val) VALUES (10, 'A great title 10', 'A long description 10', 'tagtwo', 20, 10000010, 1010.5, 1, 1634549843, '{\"a\":10,\"b\":11}', (12,14,16,18))
HTTP
request with offset and limit:
{
"index": "la8_test",
"offset": 3,
"limit": 3,
"aggs": {
"str_val": {
"terms": {
"field": "str_val"
}
}
},
"query": {
"bool": {
"must": [{
"query_string": "great long description"
}]
}
}
}
response (hits are removed for clarity) aggregations.str_val.buckets is empty, but there should be 2 elements:
{
"took": 0,
"timed_out": false,
"hits": {
"total": 10,
"hits": [ ... removed from this post ... ]
},
"aggregations": {
"str_val": {
"buckets": []
}
}
}
SQL
The same query in SQL returns the expcted 2 rows in the facet
$ mysql -h 0 -P 9306
mysql > SELECT * FROM la8_test WHERE MATCH('great long description') LIMIT 3,3 FACET str_val;
+------+---------+------------+------------+----------+--------------------+-------------+---------+-----------------+-------------+
| id | int_val | bigint_val | time_val | bool_val | count_optimization | float_val | str_val | json_val | mva_int_val |
+------+---------+------------+------------+----------+--------------------+-------------+---------+-----------------+-------------+
| 5 | 15 | 10000005 | 1634734135 | 1 | 1 | 1005.500000 | tagtwo | {"a":5,"b":6} | 7,9,11,13 |
| 10 | 20 | 10000010 | 1634734130 | 1 | 1 | 1010.500000 | tagtwo | {"a":10,"b":11} | 12,14,16,18 |
| 9 | 19 | 10000009 | 1634734131 | 1 | 1 | 1009.500000 | tagtwo | {"a":9,"b":10} | 11,13,15,17 |
+------+---------+------------+------------+----------+--------------------+-------------+---------+-----------------+-------------+
3 rows in set (0,00 sec)
+---------+----------+
| str_val | count(*) |
+---------+----------+
| tagtwo | 7 |
| tagone | 3 |
+---------+----------+
2 rows in set (0,00 sec)
PHP
The official php api also produces the wrong results: when ->offset() and ->limit() are used, the ->facet('str_val') resultset is empty (but should have 2 elements).
You can examine the facet result as you change the ->offset() and ->limit() values.
$client = new \Manticoresearch\Client(['host'=>'127.0.0.1','port'=>9308]);
$search = new \Manticoresearch\Search($client);
$res = $search
->setIndex('la8_test')
->search('great long description')
->offset(3)->limit(3) // change these to see the facet results change
->facet('str_val')
->get();
print_r($res->getResponse()->getResponse()['hits']['hits']);
print_r($res->getFacets()['str_val']['buckets']);
Expected behavior
HTTP response aggregations.str_val.buckets should have 2 elements, just like the SQL version.
Describe the environment:
- Manticore 4.0.2 af497f245@210921 release (columnar 1.11.2 69d3780@210921)
- Kubuntu 20: Linux PCNAME 5.11.0-37-generic #41-Ubuntu SMP Mon Sep 20 16:39:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
I'm seeing the same thing on 3.5.2 so it doesn't appear to be a new issue
@sanikolaev Do you know if there is any movement on this?
Not many of our projects have search pagination (with an emphasis on relevance) but in the rare circumstances we do I've had to write an SQL only version because of this facet issue and it would be nice to know if i should ditch the HTTP JSON API entirely.
Hi @jlawrence-yellostudio I'll check with the team on how difficult it's to change the behaviour and what we can do in general.
seems like a bug as each aggs query copy offset, limit from the main query when rewrite limit from the size property of the aggs part that is wrong and aggs should not copy offset from the main
I have the same issue with pagination using HTTP API.
Also this bug is not related the PHP lib, or Python lib (which one Im using).
I made cURL requests directly to the Manticore API and can confirm that offset affects aggregation buckets
Fixed in 60401eb1
Note though that when size is not set for a facet it gets inherited from the main query's limit.