scylla-operator Provide a workaround for GKE reserving 10% of disk space on local SSD nodes by default

What should the feature do?

The default kubelet hardEviction settings for nodefs.available is 10%.

For Scylla running on GKE nodes this applies to the local SSDs.

There is a feature request to enable changing this value in GKE but it is not implemented as of now: https://issuetracker.google.com/issues/185760232

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

We did it ourselves but it would be great if Scylla Operator would do this, perhaps next to the node tuning / as a part of the node tuning.

What is the use case behind this feature?

Everyone running Scylla on GKE with local SSDs, to not waste 10% of their disk space

Anything else we need to know?

No response

Aug 05 '24 11:08 gdubicki

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

Sep 05 '24 10:09 scylla-operator-bot[bot]

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out

/lifecycle rotten

Oct 06 '24 10:10 scylla-operator-bot[bot]

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

We did it ourselves but it would be great if Scylla Operator would do this, perhaps next to the node tuning / as a part of the node tuning.

If you would be interested in this, I could provide a PR with this feature.

Oct 11 '24 16:10 gdubicki

/remove-lifecycle rotten /remove-lifecycle stale

Oct 11 '24 16:10 gdubicki

If you would be interested in this, I could provide a PR with this feature.

Hi, you are welcome to open a PR, please make sure you follow the Contributing Guide.

Thanks for contributing.

Oct 13 '24 07:10 ylebi

From what a I recall kubelet config in GKE was supported to be changed by editing a node pool which is out of reach for our automation, so more of a docs mention IMO when it gets there.

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

This likely gets you into an unsupported territory and a bunch of initialization races. It may be better to wait for GKE to allow adjusting it, before recommending that to others.

We have also migrated our DaemonSets into a NodeConfig and we don't add new DaemonSets anymore, in favour of the API.

This also isn't ScyllaDB issue. I know we try to tune some stuff where we can't avoid it but each of them come with a burden and we have to balance between the benefits, stability, cross platform support and how much hacky it is.

Oct 14 '24 07:10 tnozicka

I think you are right, @tnozicka. It's a bad idea to provide an unsupported solution for GKE in the Scylla Operator repo,

I would at least like the next person affected by this to learn about it the easier way though.

Would you accept a PR to https://operator.docs.scylladb.com/stable/gke.html to document this potential issue?

Oct 14 '24 07:10 gdubicki

I think a docs mention is fitting and referencing the GKE issue is also helpful, so I'd welcome a :::{note} in our docs somewhere around where we set the kubelet config in GKE. Thanks @gdubicki

Oct 14 '24 07:10 tnozicka

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

Nov 13 '24 11:11 scylla-operator-bot[bot]

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out

/lifecycle rotten

Dec 14 '24 11:12 scylla-operator-bot[bot]

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out

/close not-planned

Jan 13 '25 11:01 scylla-operator-bot[bot]

@scylla-operator-bot[bot]: Closing this issue, marking it as "Not Planned".

In response to this:

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jan 13 '25 11:01 scylla-operator-bot[bot]

scylla-operator scylla-operator copied to clipboard

Provide a workaround for GKE reserving 10% of disk space on local SSD nodes by default

What should the feature do?

What is the use case behind this feature?

Anything else we need to know?

scylla-operator
scylla-operator copied to clipboard