microshift icon indicating copy to clipboard operation
microshift copied to clipboard

NO-ISSUE: add RestartSec=10s for systemd service

Open lance5890 opened this issue 6 months ago • 13 comments

Which issue(s) this PR addresses:

As the default RestartSec is 100ms,The microshift may not startup in default StartLimitBurst(5 times) ; we should expand the systemd RestartSec to 10s,just like the etcd serivce does as follows: https://github.com/openshift/microshift/blob/032d259f3c000e68758e7f2e207ddfce2397a19e/ansible/roles/install-microshift/files/etcd.service#L20

lance5890 avatar May 28 '25 08:05 lance5890

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lance5890 Once this PR has been reviewed and has the lgtm label, please assign jerpeter1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar May 28 '25 08:05 openshift-ci[bot]

Hi @lance5890. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar May 28 '25 08:05 openshift-ci[bot]

/cc @pacevedom

lance5890 avatar May 28 '25 08:05 lance5890

@lance5890: This pull request explicitly references no jira issue.

In response to this:

Which issue(s) this PR addresses:

As the default RestartSec is 100ms,The microshift may not startup in default StartLimitBurst(5 times) ; we should expand the systemd RestartSec to 10s,just like the etcd serivce does as follows: https://github.com/openshift/microshift/blob/032d259f3c000e68758e7f2e207ddfce2397a19e/ansible/roles/install-microshift/files/etcd.service#L20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 28 '25 08:05 openshift-ci-robot

Hey @lance5890, thanks for your contribution.

Could you explain how increasing time between restarts will help you? What problem are you observing?

/label ok-to-test

pmtk avatar Jun 05 '25 06:06 pmtk

@pmtk: The label(s) /label ok-to-test cannot be applied. These labels are supported: acknowledge-critical-fixes-only, platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, ux-approved, no-qe, downstream-change-needed, rebase/manual, cluster-config-api-changed, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, stability-fix-approved, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

Hey @lance5890, thanks for your contribution.

Could you explain how increasing time between restarts will help you? What problem are you observing?

/label ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Jun 05 '25 06:06 openshift-ci[bot]

/ok-to-test

pmtk avatar Jun 05 '25 06:06 pmtk

Hey @lance5890, thanks for your contribution.

Could you explain how increasing time between restarts will help you? What problem are you observing?

/label ok-to-test

We deployed MicroShift on a host with insufficient performance. During startup, we found that the network was not fully configured before the systemd service started, which caused MicroShift to fail to start. After failing more than five times, it stopped attempting to start. The issue was resolved after we manually adjusted the RestartSec=10s. image

lance5890 avatar Jun 05 '25 08:06 lance5890

/retest-required

lance5890 avatar Jun 06 '25 01:06 lance5890

I noticed you've taken steps to address the problem with StartLimitIntervalSec. However, I'm afraid that this change needlessly increases the timeout that some feature depend on and I'm hesitant to apply it by default.

I would suggest instead that you use a systemd drop-in to configure required settings to match your environment.

pmtk avatar Jun 06 '25 06:06 pmtk

I noticed you've taken steps to address the problem with StartLimitIntervalSec. However, I'm afraid that this change needlessly increases the timeout that some feature depend on and I'm hesitant to apply it by default.

I would suggest instead that you use a systemd drop-in to configure required settings to match your environment.

directly extending the RestartSec parameter to 10s might be a bit risky, but I still suggest starting by modifying this value to 1s (compared to the default of 100ms) would resolve most issues

lance5890 avatar Jun 06 '25 06:06 lance5890

@lance5890: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Jun 06 '25 07:06 openshift-ci[bot]

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-merge-robot avatar Jun 12 '25 22:06 openshift-merge-robot

Since the problem isn't plaguing neither the users setups nor our tests except for one place which we accounted for with systemd drop-in (which we think is valid way of tweaking the behavior depending on the hardware) I think we'll not include this change at current time.

/close

pmtk avatar Jul 16 '25 09:07 pmtk

@pmtk: Closed this PR.

In response to this:

Since the problem isn't plaguing neither the users setups nor our tests except for one place which we accounted for with systemd drop-in (which we think is valid way of tweaking the behavior depending on the hardware) I think we'll not include this change at current time.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Jul 16 '25 09:07 openshift-ci[bot]