scylla-operator icon indicating copy to clipboard operation
scylla-operator copied to clipboard

Allow overwriting non-essential scylla arguments

Open vponomaryov opened this issue 2 years ago • 6 comments

Describe the bug If we set scyllaArgs option of the scyllacluster CR like following:

spec:
    ...
    scyllaArgs: --blocked-reactor-notify-ms 100 --abort-on-lsa-bad-alloc 1 --abort-on-internal-error
      1 --abort-on-ebadf 1 --enable-sstable-key-validation 1
    ...

Then we get following error:

Scylla version 4.6.3-0.20220414.8bf149fdd with build-id 8d16d8972498cc769071ff25309b009eb77bf77a starting ...
command used: "/usr/bin/scylla \
	--log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix \
	--io-properties-file=/etc/scylla.d/io_properties.yaml --smp 1 --listen-address 0.0.0.0 \
	--rpc-address 0.0.0.0 --seed-provider-parameters seeds=10.96.198.61 \
	--broadcast-address 10.96.198.61 --broadcast-rpc-address 10.96.198.61 --alternator-address 0.0.0.0 \
	--blocked-reactor-notify-ms 999999999 --blocked-reactor-notify-ms=100 --abort-on-internal-error=1 \
	--prometheus-address=0.0.0.0 --abort-on-ebadf=1 --abort-on-lsa-bad-alloc=1 --enable-sstable-key-validation=1"

parsed command line options: [\
	log-to-syslog, (positional) 0, log-to-stdout, (positional) 1, default-log-level, (positional) info, \
	network-stack, (positional) posix, io-properties-file: /etc/scylla.d/io_properties.yaml, \
	smp, (positional) 1, listen-address: 0.0.0.0, rpc-address: 0.0.0.0, \
	seed-provider-parameters: seeds=10.96.198.61, broadcast-address: 10.96.198.61, \
	broadcast-rpc-address: 10.96.198.61, alternator-address: 0.0.0.0, \
	blocked-reactor-notify-ms, (positional) 999999999, blocked-reactor-notify-ms: 100, \
	abort-on-internal-error: 1, prometheus-address: 0.0.0.0, abort-on-ebadf: 1, \
	abort-on-lsa-bad-alloc: 1, enable-sstable-key-validation: 1]
error: option '--blocked-reactor-notify-ms' cannot be specified more than once

Try --help.
2022-06-20 13:45:52,473 INFO exited: scylla (exit status 2; not expected)

So, the --blocked-reactor-notify-ms option is set by operator all the time and if a user wants to redefine it then such an error appears.

To Reproduce Steps to reproduce the behavior:

  1. Deploy operator 1.7.2
  2. Create a Scylla cluster with the above mentioned scyllaArgs option value.
  3. See error

Expected behavior Operator should redefine it's own value.

Environment:

  • Platform: any
  • Kubernetes version: any
  • Scylla version: 4.6.3
  • Scylla-operator version: e.g.: 1.7.2

vponomaryov avatar Jun 20 '22 13:06 vponomaryov

currently and historically you can't override scylla args set by the operator. this is coupled a lot to the startup scripts in the scylla image. we may get a better behaviour with https://github.com/scylladb/scylla-operator/pull/964 but this has never worked

tnozicka avatar Jun 21 '22 06:06 tnozicka

maybe some (not this one) even shouldn't be let to be overwritten as they are essential to how we set up scylla and its relation to other kube objects

tnozicka avatar Jun 21 '22 06:06 tnozicka

maybe some (not this one) even shouldn't be let to be overwritten as they are essential to how we set up scylla and its relation to other kube objects

In such a case I would expect fast denial on the admission controller check not wasting resources and time.

vponomaryov avatar Jun 21 '22 07:06 vponomaryov

admission can't be restricted retrospectively not to break backwards compatibility (that's why it is important to get it right the first time)

tnozicka avatar Jun 21 '22 08:06 tnozicka

admission can't be restricted retrospectively not to break backwards compatibility (that's why it is important to get it right the first time)

I understand your point, but it is not a problem in our case because there cannot be any user that will be affected by such new restriction due to the bug that always existed. So, since none of the users may be affected I consider such an update to the admission controller as safe enough change to fix the bug.

vponomaryov avatar Jun 21 '22 08:06 vponomaryov

This is not allowed. Objects valid in the past have to be able to be created again. Also storage migration must keep working so you must be able to go thought GET+UPDATE with the same content.

tnozicka avatar Jun 22 '22 08:06 tnozicka

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out

/lifecycle stale