elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

Investigate severity of upgrading to 8.0.1 despite terminal deprecation of old features

Open grcevski opened this issue 2 years ago • 10 comments

Elasticsearch Version

upgrade from 7.x -> 8.0.0

Installed Plugins

No response

Java Version

bundled

OS Version

Linux

Problem Description

Recently we encountered a user issue report where upgrades to 8.0.1 from 7.17.1 caused an unhealthy cluster because the option index.indexing.slowlog.level wasn't removed prior to upgrading.

We'll use this issue to understand how this happened and ways we can prevent the upgrade before irreversible changes are done to the Elasticsearch metadata or data.

Steps to Reproduce

  • Get Elasticsearch 7.9 cluster with the index.indexing.slowlog.level option set.
  • Upgrade to 7.17.1 ignore the deprecation warning
  • Upgrade to 8.0.1

Expected behaviour: the 8.0.1 upgrade should fail to start but not upgrade any metadata or data Current behaviour: the upgrade to 8.0.1 proceeds irreversibly modifying the metadata before options are validated and start is unsuccessful.

Logs (if relevant)

No response

grcevski avatar Mar 15 '22 15:03 grcevski

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticmachine avatar Mar 15 '22 15:03 elasticmachine

looks like indeed there is a bug.

I reproduced this on 7.17.1

PUT test2
{"settings": {

	"index.search.slowlog.threshold.query.warn": "100ms",
	"index.search.slowlog.threshold.query.info": "100ms",
		"index.search.slowlog.threshold.query.debug": "0ms",
	"index.search.slowlog.threshold.query.trace": 0,
	"index.indexing.slowlog.threshold.index.trace": 0,
	  "index.indexing.slowlog.source": 5,
	  "index.indexing.slowlog.level": "INFO"
}
}
#! [index.indexing.slowlog.level] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version.
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test2"
}

Deprecation logs do show this warning as CRITICAL. Screenshot 2022-03-17 at 12 50 59 Screenshot 2022-03-17 at 12 51 36

I can see in codebase it is correctly set as Property.Deprecated - not DepreactedWarning https://github.com/elastic/elasticsearch/blob/bcfbf000748b538884087623113682165abcb92d/server/src/main/java/org/elasticsearch/index/IndexingSlowLog.java#L76

In my view the problem with deprecation logs in upgrade assistant is that it does not look too encouraging to check them. There is a link and I am not sure many people will click it. Most likely people will just hit upgrade when there is nothing critical on the first page. Screenshot 2022-03-17 at 12 57 15

However the deprecation info api do not return the setting as critical

  GET /_migration/deprecations

  "index_settings" : {
    "test2" : [
      {
        "level" : "warning",
        "message" : "Setting [index.indexing.slowlog.level] is deprecated",
        "url" : "https://ela.st/es-deprecation-7-slowlog-settings",
        "details" : "Remove the [index.indexing.slowlog.level] setting. Use the [index.*.slowlog.threshold] settings to set the log levels.",
        "resolve_during_rolling_upgrade" : false,
        "_meta" : {
          "actions" : [
            {
              "action_type" : "remove_settings",
              "objects" : [
                "index.indexing.slowlog.level"
              ]
            }
          ]
        }
      }
    ]
  },
Screenshot 2022-03-17 at 12 48 15

the problem is that the check uses incorrect level https://github.com/elastic/elasticsearch/blob/eed16a2886b9d3e0f5c21978a8459bb8defbf606/x-pack/plugin/deprecation/src/main/java/org/elasticsearch/xpack/deprecation/IndexDeprecationChecks.java#L603

will look into a fix

pgomulka avatar Mar 17 '22 12:03 pgomulka

have we written an SKI for this, if not, can we please do that?

pcsanwald avatar Mar 18 '22 14:03 pcsanwald

just a follow up. In 7.17.2-SNAPSHOT it works fine and upgrade assistant presents critical deprecation: Screenshot 2022-03-28 at 10 41 23 Screenshot 2022-03-28 at 10 41 31

pgomulka avatar Mar 28 '22 08:03 pgomulka

What surprises me is that when I try reproducing this on my 1node cluster, after upgrade index has status yellow and the level setting is archived (archived.index.indexing.slowlog.level) However when I try this on cloud from 7.17.1, after upgrade index is red and the level is not archived.. @grcevski do you have any ideas?

pgomulka avatar Mar 28 '22 10:03 pgomulka

Oh, very interesting. Looking at the current code it looks like we should be archiving and it's what happens for me locally too. I wonder if we have fixed a bug since 8.0.1 that resolved this issue?

grcevski avatar Mar 28 '22 18:03 grcevski

I think this got resolved, I'll close this for now as we haven't had this happen since.

grcevski avatar Jul 04 '22 14:07 grcevski

Reopening this issue, it looks like customers still experience this issue when upgrading to 8.x from versions <7.17. See https://github.com/elastic/sdh-control-plane/issues/1836

arteam avatar Sep 20 '22 08:09 arteam

However when I try this on cloud from 7.17.1, after upgrade index is red and the level is not archived..

I expect that's because we only run the settings-archival process in a full cluster restart upgrade, and on Cloud it'll be a rolling upgrade. But that's the wrong fix here, we should not be archiving these settings. BwC on index metadata is supposed to be a function of the index created version and not the cluster version, and these settings were acceptable in 7.x-created indices, so we should be accepting them as-is on these indices even in an 8.x cluster. We can reasonably ignore them I think, but we mustn't reject them as we are doing today.

DaveCTurner avatar Sep 20 '22 08:09 DaveCTurner

We can reuse this issue to fix this bug.

grcevski avatar Sep 20 '22 12:09 grcevski