elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

Sliced scroll returning more hits than normal (without slice ) search

Open divit00 opened this issue 2 years ago • 1 comments

Elasticsearch Version

7.16.3

Installed Plugins

No response

Java Version

openjdk version "1.8.0_342"

OS Version

Oracle Linux 7.9

Problem Description

In short, normal search hits < scroll slice 1 hits + scroll slice 2 hits

When I add up sliced scroll hits, it is more than total no of documents returned from single search.

Normal search query

GET index*/_search
{
  "track_total_hits": true,
  "sort": [
    {
      "@timestamp": {
        "order": "asc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": false,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-07-31T18:30:00.000Z",
              "lte": "2022-08-30T18:30:00.000Z"
            }
          }
        }
.....

Normal Search Response:

{
  "took" : 1055,
  "timed_out" : false,
  "_shards" : {
    "total" : 455,
    "successful" : 455,
    "skipped" : 290,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 435743,
      "relation" : "eq"
    },
    "max_score" : null,

Slice 1

GET index*/_search
{ 
 "slice": {
    "id": 0,
    "max": 2
  },
  "track_total_hits": true,
  "sort": [
    {
      "@timestamp": {
        "order": "asc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": false,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-07-31T18:30:00.000Z",
              "lte": "2022-08-30T18:30:00.000Z"
            }
          }
        }
.....

Slice 1 Response:

  "_shards" : {
    "total" : 455,
    "successful" : 455,
    "skipped" : 290,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 213954,
      "relation" : "eq"
    },
    "max_score" : null,

Slice 2

GET index*/_search
{ 
 "slice": {
    "id": 1,
    "max": 2
  },
  "track_total_hits": true,
  "sort": [
    {
      "@timestamp": {
        "order": "asc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": false,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-07-31T18:30:00.000Z",
              "lte": "2022-08-30T18:30:00.000Z"
            }
          }
        }
.....

Slice 2 Response:

  "_shards" : {
    "total" : 455,
    "successful" : 455,
    "skipped" : 292,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 221884,
      "relation" : "eq"
    },
    "max_score" : null,

So as we can see,

total hits for slices = 213954 + 221884 = 435838 which is greater than 435743 (hits for normal search).

Can someone explain why is it behaving like this?

FYI, data is not being inserted/deleted.

Could this be due to the date range ?

Steps to Reproduce

It does not happen all the time. With small indexes it seems to work fine.

Logs (if relevant)

No response

divit00 avatar Sep 20 '22 19:09 divit00

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine avatar Sep 21 '22 10:09 elasticsearchmachine

@divit00 there are couple issues that could be causing the discrepancy:

  • Looking at your searches, I don't see a scroll defined. Are your searches actually using scroll? It's a bit confusing, but searches are allowed to have a slice without a scroll.
  • It looks like the data might be changing between searches. You can see this because the first slice has skipped: 290 but the second one has skipped: 292. This means that the first search ran over more shards than the second.

jtibshirani avatar Sep 23 '22 20:09 jtibshirani

Hello @jtibshirani

Sorry, scroll was omitted when i was replacing the indexname for posting it. Running it again :

Normal Search

GET index*/_search
{
  "track_total_hits": true,
  "sort": [
    {
      "@timestamp": {
        "order": "asc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": false,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-07-31T18:30:00.000Z",
              "lte": "2022-08-30T18:30:00.000Z"
            }
          }
         }            
         ]
          }
        }
      ]
    }
  }
}

Normal search response

{
  "took" : 1923,
  "timed_out" : false,
  "_shards" : {
    "total" : 455,
    "successful" : 455,
    "skipped" : 290,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 435743,
      "relation" : "eq"
    },

Sliced Scroll 1

GET index*/_search?scroll=1m
{
  "slice": {
    "id": 0,
    "max": 2
  },
  "track_total_hits": true,
  "sort": [
    {
      "@timestamp": {
        "order": "asc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": false,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-07-31T18:30:00.000Z",
              "lte": "2022-08-30T18:30:00.000Z"
            }
          }
        }
      ]
    }
  }
}

Sliced Scroll 1 Response

 "took" : 1014,
  "timed_out" : false,
  "_shards" : {
    "total" : 455,
    "successful" : 455,
    "skipped" : 290,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 213954,
      "relation" : "eq"
    },

Sliced Scroll 2

GET index*/_search?scroll=1m
{
  "slice": {
    "id": 1,
    "max": 2
  },
  "track_total_hits": true,
  "sort": [
    {
      "@timestamp": {
        "order": "asc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": false,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-07-31T18:30:00.000Z",
              "lte": "2022-08-30T18:30:00.000Z"
            }
          }
        }
      ]
    }
  }
}

Sliced Scroll 2 Response:

  "took" : 1173,
  "timed_out" : false,
  "_shards" : {
    "total" : 455,
    "successful" : 455,
    "skipped" : 293,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 229153,
      "relation" : "eq"
    },

Today the counts are different but still don't add up to normal search. Another thing, I noticed was upon running it multiple times, slice 1 count remained same but slice 2 count changed. Sometimes it skipped 293 shards, sometimes 295.

For 3 slices, Count was 140573 + 146027 + 143114 = 429714 For 4 slices, Count was 106694 + 110628 + 106709 + 107141 = 431172

For 3 slices too, first slice skipped 290 shards but 2nd and 3rd slice skipped 292 and 294 shards. Is it necessary that all slices hit same number of shards ?

FYI, I am querying multiple indexes. For single index sliced scroll in working fine. Also, data is not changed as you can see normal search is still returning the same count which it returned few days ago.

I think it has to something to do with sort which I have used.

divit00 avatar Sep 23 '22 20:09 divit00

FYI, removing the sort also didn't help.

For 2 slices, 213954 + 220974 = 434928

divit00 avatar Sep 23 '22 21:09 divit00

@divit00 this behavior indeed seems off, given the data is not changing. The fact that the two searches show different skipped values is surprising to me. It might be challenging for us to debug, since it only reproduces with certain index configurations.

I'm also curious if you've tried using point-in-time views instead of scroll. This is now our recommended way to paginate through large datasets (described here: https://www.elastic.co/guide/en/elasticsearch/reference/8.4/paginate-search-results.html). Point-in-time might run into the same issue on your data, but it could be good to try.

jtibshirani avatar Sep 23 '22 22:09 jtibshirani

Yes, PIT with search_after works fine. In fact I came to know about the issue while comparing the performance between the two.

divit00 avatar Sep 24 '22 06:09 divit00

I see! To set expectations, we are hoping to deprecate scrolls in favor of pit with search_after. So we're not likely to spend a lot of time debugging this issue.

jtibshirani avatar Sep 26 '22 16:09 jtibshirani

Sure, I will close it then.

divit00 avatar Sep 26 '22 17:09 divit00

Thanks for your understanding, and best of luck with testing.

jtibshirani avatar Sep 26 '22 17:09 jtibshirani