quickwit
quickwit copied to clipboard
Double sorting with aggregation not working
Describe the bug
When we request collection with aggregation with sorting by two fields we see two bugs:
- Missing sorting
- Different results for each call
Steps to reproduce (if applicable)
- Download dataset and index config from https://dev2.manticoresearch.com/index-settings-and-data.zip
- Run Quickwit in Docker
quickwit/quickwit:0.8.1 - Create index (config provided in attached archive):
export HOST='http://localhost:7280'
curl -s -XPOST "${HOST}/api/v1/indexes" \
--header "content-type: application/yaml" \
--data-binary @./index-config.yaml
- Upload data (Dataset is pretty big, so we split it into chunks):
split -l 10000 ./data.jsonl ./data_splitted.
echo "Starting loading"
for f in ./data_splitted.*; do
echo "Upload chunk $f"
curl -s -XPOST "${HOST}/api/v1/hn_small/ingest?commit=force" --data-binary @$f
rm $f
done
echo "Finished"
- Perform query:
curl --location '${HOST}/api/v1/hn_small/search' \
--header 'Content-Type: application/json' \
--data '{"query":"*","max_hits":0,"aggs":{"comment_ranking_avg":{"terms":{"field":"comment_ranking","size":20,"order":{"avg_field":"desc","_key":"desc"}},"aggs":{"avg_field":{"avg":{"field":"author_comment_count"}}}}}}'
- We got results with the wrong sorting
{
"num_hits": 1165439,
"hits": [],
"elapsed_time_micros": 6665,
"errors": [],
"aggregations": {
"comment_ranking_avg": {
"buckets": [
{
"avg_field": {
"value": 3504.0
},
"doc_count": 1,
"key": 928.0 # Should be 2nd
},
{
"avg_field": {
"value": 3504.0
},
"doc_count": 1,
"key": 961.0 # Should be 1st
},
{
"avg_field": {
"value": 3504.0
},
"doc_count": 1,
"key": 730.0
},
.....
- if you repeat the request several times it can return different results (for the same query)
{
"num_hits": 1165439,
"hits": [],
"elapsed_time_micros": 9610,
"errors": [],
"aggregations": {
"comment_ranking_avg": {
"buckets": [
{
"avg_field": {
"value": 64.0
},
"doc_count": 1,
"key": 1305.0
},
{
"avg_field": {
"value": 117.0
},
"doc_count": 1,
"key": 1296.0
},
{
"avg_field": {
"value": 40.0
},
"doc_count": 1,
"key": 1289.0
},
{
"avg_field": {
"value": 87.0
},
"doc_count": 1,
"key": 1287.0
},
......
PS: Sometimes it returns results without grouping. In that case you should reindex your dataset
"buckets": [
{
"avg_field": {
"value": 3504.0
},
"doc_count": 1,
"key": 961.0
},
{
"avg_field": {
"value": 3080.0
},
"doc_count": 1,
"key": 980.0
},
{
"avg_field": {
"value": 3077.0
},
"doc_count": 1,
"key": 1176.0
},
So generally we can get 3 different results for one query.
PS: Elasticsearch compatible URL has the same behaviour
Expected behavior It should return the dataset like provided below
{
"num_hits": 1165439,
"hits": [],
"elapsed_time_micros": 6665,
"errors": [],
"aggregations": {
"comment_ranking_avg": {
"buckets": [
{
"avg_field": {
"value": 3504.0
},
"doc_count": 1,
"key": 961.0
},
{
"avg_field": {
"value": 3504.0
},
"doc_count": 1,
"key": 928.0
},
{
"avg_field": {
"value": 3504.0
},
"doc_count": 1,
"key": 730.0
},
.....
Configuration: Please provide:
- Output of
quickwit --version
Quickwit 0.8.1 (aarch64-unknown-linux-gnu 2024-03-29T14:09:41Z e6c5396)
- The index_config.yaml
Provided in the attached archive)
{
"query": "*",
"max_hits": 0,
"aggs": {
"comment_ranking_avg": {
"terms": {
"field": "comment_ranking",
"size": 20,
"order": {
"avg_field": "desc",
"_key": "desc"
}
},
"aggs": {
"avg_field": {
"avg": {
"field": "author_comment_count"
}
}
}
}
}
}
This is not a correct way to define the order. It should be:
"order": [ { "avg_field": "desc" }, { "_key":"desc" } ]
But currently this is not supported, only sort by one field is supported currently.
Provided order is not working also, but it's still not implemented
curl --location 'http://127.0.0.1:7280/api/v1/hn_small/search' \
--header 'Content-Type: application/json' \
--data '{
"query": "*",
"max_hits": 0,
"aggs": {
"comment_ranking_avg": {
"terms": {
"field": "comment_ranking",
"size": 20,
"order": [
{
"avg_field": "desc"
},
{
"_key": "desc"
}
]
},
"aggs": {
"avg_field": {
"avg": {
"field": "author_comment_count"
}
}
}
}
}
}'
{
"message": "invalid aggregation request: invalid type: sequence, expected a map at line 1 column 180"
}
So with an order by one key, it works fine and gives the same results each call.
Probably you just should notice somewhere in docs that you support now only one argument for sorting.
@PSeitz will the issue be closed with the merged PR https://github.com/quickwit-oss/quickwit/pull/5121 ?
There's also https://github.com/quickwit-oss/tantivy/pull/2451
But it's just covering error handling, not implementing order by multiple fields