scylla-operator icon indicating copy to clipboard operation
scylla-operator copied to clipboard

Provide a workaround for GKE reserving 10% of disk space on local SSD nodes by default

Open gdubicki opened this issue 1 year ago • 8 comments

What should the feature do?

The default kubelet hardEviction settings for nodefs.available is 10%.

For Scylla running on GKE nodes this applies to the local SSDs.

There is a feature request to enable changing this value in GKE but it is not implemented as of now: https://issuetracker.google.com/issues/185760232

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

We did it ourselves but it would be great if Scylla Operator would do this, perhaps next to the node tuning / as a part of the node tuning.

What is the use case behind this feature?

Everyone running Scylla on GKE with local SSDs, to not waste 10% of their disk space

Anything else we need to know?

No response

gdubicki avatar Aug 05 '24 11:08 gdubicki

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out

/lifecycle stale

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out

/lifecycle rotten

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

We did it ourselves but it would be great if Scylla Operator would do this, perhaps next to the node tuning / as a part of the node tuning.

If you would be interested in this, I could provide a PR with this feature.

gdubicki avatar Oct 11 '24 16:10 gdubicki

/remove-lifecycle rotten /remove-lifecycle stale

gdubicki avatar Oct 11 '24 16:10 gdubicki

If you would be interested in this, I could provide a PR with this feature.

Hi, you are welcome to open a PR, please make sure you follow the Contributing Guide.

Thanks for contributing.

ylebi avatar Oct 13 '24 07:10 ylebi

From what a I recall kubelet config in GKE was supported to be changed by editing a node pool which is out of reach for our automation, so more of a docs mention IMO when it gets there.

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

This likely gets you into an unsupported territory and a bunch of initialization races. It may be better to wait for GKE to allow adjusting it, before recommending that to others.

We have also migrated our DaemonSets into a NodeConfig and we don't add new DaemonSets anymore, in favour of the API.

This also isn't ScyllaDB issue. I know we try to tune some stuff where we can't avoid it but each of them come with a burden and we have to balance between the benefits, stability, cross platform support and how much hacky it is.

tnozicka avatar Oct 14 '24 07:10 tnozicka

I think you are right, @tnozicka. It's a bad idea to provide an unsupported solution for GKE in the Scylla Operator repo,

I would at least like the next person affected by this to learn about it the easier way though.

Would you accept a PR to https://operator.docs.scylladb.com/stable/gke.html to document this potential issue?

gdubicki avatar Oct 14 '24 07:10 gdubicki

I think a docs mention is fitting and referencing the GKE issue is also helpful, so I'd welcome a :::{note} in our docs somewhere around where we set the kubelet config in GKE. Thanks @gdubicki

tnozicka avatar Oct 14 '24 07:10 tnozicka

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out

/lifecycle stale

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out

/lifecycle rotten

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out

/close not-planned

@scylla-operator-bot[bot]: Closing this issue, marking it as "Not Planned".

In response to this:

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.