elasticsearch
elasticsearch copied to clipboard
Sliced scroll returning more hits than normal (without slice ) search
Elasticsearch Version
7.16.3
Installed Plugins
No response
Java Version
openjdk version "1.8.0_342"
OS Version
Oracle Linux 7.9
Problem Description
In short, normal search hits < scroll slice 1 hits + scroll slice 2 hits
When I add up sliced scroll hits, it is more than total no of documents returned from single search.
Normal search query
GET index*/_search
{
"track_total_hits": true,
"sort": [
{
"@timestamp": {
"order": "asc",
"unmapped_type": "boolean"
}
}
],
"_source": false,
"query": {
"bool": {
"must": [],
"filter": [
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2022-07-31T18:30:00.000Z",
"lte": "2022-08-30T18:30:00.000Z"
}
}
}
.....
Normal Search Response:
{
"took" : 1055,
"timed_out" : false,
"_shards" : {
"total" : 455,
"successful" : 455,
"skipped" : 290,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 435743,
"relation" : "eq"
},
"max_score" : null,
Slice 1
GET index*/_search
{
"slice": {
"id": 0,
"max": 2
},
"track_total_hits": true,
"sort": [
{
"@timestamp": {
"order": "asc",
"unmapped_type": "boolean"
}
}
],
"_source": false,
"query": {
"bool": {
"must": [],
"filter": [
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2022-07-31T18:30:00.000Z",
"lte": "2022-08-30T18:30:00.000Z"
}
}
}
.....
Slice 1 Response:
"_shards" : {
"total" : 455,
"successful" : 455,
"skipped" : 290,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 213954,
"relation" : "eq"
},
"max_score" : null,
Slice 2
GET index*/_search
{
"slice": {
"id": 1,
"max": 2
},
"track_total_hits": true,
"sort": [
{
"@timestamp": {
"order": "asc",
"unmapped_type": "boolean"
}
}
],
"_source": false,
"query": {
"bool": {
"must": [],
"filter": [
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2022-07-31T18:30:00.000Z",
"lte": "2022-08-30T18:30:00.000Z"
}
}
}
.....
Slice 2 Response:
"_shards" : {
"total" : 455,
"successful" : 455,
"skipped" : 292,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 221884,
"relation" : "eq"
},
"max_score" : null,
So as we can see,
total hits for slices = 213954 + 221884 = 435838
which is greater than 435743
(hits for normal search).
Can someone explain why is it behaving like this?
FYI, data is not being inserted/deleted.
Could this be due to the date range ?
Steps to Reproduce
It does not happen all the time. With small indexes it seems to work fine.
Logs (if relevant)
No response
Pinging @elastic/es-search (Team:Search)
@divit00 there are couple issues that could be causing the discrepancy:
- Looking at your searches, I don't see a
scroll
defined. Are your searches actually usingscroll
? It's a bit confusing, but searches are allowed to have aslice
without ascroll
. - It looks like the data might be changing between searches. You can see this because the first slice has
skipped: 290
but the second one hasskipped: 292
. This means that the first search ran over more shards than the second.
Hello @jtibshirani
Sorry, scroll was omitted when i was replacing the indexname for posting it. Running it again :
Normal Search
GET index*/_search
{
"track_total_hits": true,
"sort": [
{
"@timestamp": {
"order": "asc",
"unmapped_type": "boolean"
}
}
],
"_source": false,
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2022-07-31T18:30:00.000Z",
"lte": "2022-08-30T18:30:00.000Z"
}
}
}
]
}
}
]
}
}
}
Normal search response
{
"took" : 1923,
"timed_out" : false,
"_shards" : {
"total" : 455,
"successful" : 455,
"skipped" : 290,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 435743,
"relation" : "eq"
},
Sliced Scroll 1
GET index*/_search?scroll=1m
{
"slice": {
"id": 0,
"max": 2
},
"track_total_hits": true,
"sort": [
{
"@timestamp": {
"order": "asc",
"unmapped_type": "boolean"
}
}
],
"_source": false,
"query": {
"bool": {
"must": [],
"filter": [
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2022-07-31T18:30:00.000Z",
"lte": "2022-08-30T18:30:00.000Z"
}
}
}
]
}
}
}
Sliced Scroll 1 Response
"took" : 1014,
"timed_out" : false,
"_shards" : {
"total" : 455,
"successful" : 455,
"skipped" : 290,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 213954,
"relation" : "eq"
},
Sliced Scroll 2
GET index*/_search?scroll=1m
{
"slice": {
"id": 1,
"max": 2
},
"track_total_hits": true,
"sort": [
{
"@timestamp": {
"order": "asc",
"unmapped_type": "boolean"
}
}
],
"_source": false,
"query": {
"bool": {
"must": [],
"filter": [
{
"range": {
"@timestamp": {
"format": "strict_date_optional_time",
"gte": "2022-07-31T18:30:00.000Z",
"lte": "2022-08-30T18:30:00.000Z"
}
}
}
]
}
}
}
Sliced Scroll 2 Response:
"took" : 1173,
"timed_out" : false,
"_shards" : {
"total" : 455,
"successful" : 455,
"skipped" : 293,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 229153,
"relation" : "eq"
},
Today the counts are different but still don't add up to normal search. Another thing, I noticed was upon running it multiple times, slice 1 count remained same but slice 2 count changed. Sometimes it skipped 293 shards, sometimes 295.
For 3 slices, Count was 140573 + 146027 + 143114
= 429714
For 4 slices, Count was 106694 + 110628 + 106709 + 107141
= 431172
For 3 slices too, first slice skipped 290
shards but 2nd and 3rd slice skipped 292
and 294
shards. Is it necessary that all slices hit same number of shards ?
FYI, I am querying multiple indexes. For single index sliced scroll in working fine. Also, data is not changed as you can see normal search is still returning the same count which it returned few days ago.
I think it has to something to do with sort which I have used.
FYI, removing the sort also didn't help.
For 2 slices, 213954 + 220974
= 434928
@divit00 this behavior indeed seems off, given the data is not changing. The fact that the two searches show different skipped
values is surprising to me. It might be challenging for us to debug, since it only reproduces with certain index configurations.
I'm also curious if you've tried using point-in-time views instead of scroll. This is now our recommended way to paginate through large datasets (described here: https://www.elastic.co/guide/en/elasticsearch/reference/8.4/paginate-search-results.html). Point-in-time might run into the same issue on your data, but it could be good to try.
Yes, PIT with search_after works fine. In fact I came to know about the issue while comparing the performance between the two.
I see! To set expectations, we are hoping to deprecate scrolls in favor of pit
with search_after
. So we're not likely to spend a lot of time debugging this issue.
Sure, I will close it then.
Thanks for your understanding, and best of luck with testing.