cloud-init icon indicating copy to clipboard operation
cloud-init copied to clipboard

feat: Warn user of unexpected run mode

Open holmanb opened this issue 9 months ago • 10 comments

Proposed Commit Message

feat: Warn user of unexpected run mode
  
On systemd, services are started by PID 1. When this doesn't happen, cloud-init
is in an unknown run state and should warn the user.

- Reorder pid log to be able to reuse Distro information.
- Add docstring deprecating util.is_Linux().

Merge type

  • [x] Squash merge using "Proposed Commit Message"
  • [ ] Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

holmanb avatar Apr 24 '24 21:04 holmanb

Because we patched out the exit 2 behavior on old releases, there's no backwards incompatibility here, correct? Just an extra warning in the logs?

We would see an exit 2 when we SRU to Noble though...

TheRealFalcon avatar Apr 30 '24 22:04 TheRealFalcon

Because we patched out the exit 2 behavior on old releases, there's no backwards incompatibility here, correct? Just an extra warning in the logs?

We would see an exit 2 when we SRU to Noble though...

yep and yep. I think it's worth a followup quilt patch in ubuntu/noble:debian/patches/retain-ppid-unsupported-config-as-info-level.patch to ensure we don't set warning level on the logs in noble.

blackboxsw avatar May 02 '24 20:05 blackboxsw

yep and yep. I think it's worth a followup quilt patch in ubuntu/noble:debian/patches/retain-ppid-unsupported-config-as-info-level.patch to ensure we don't set warning level on the logs in noble.

we could always just make this a deprecation log and deal with this in the same way that we choose to deal with other new deprecations on older series - a patch per log like this seems really expensive to maintain in the long run

holmanb avatar May 02 '24 22:05 holmanb

we could always just make this a deprecation log and deal with this in the same way that we choose to deal with other new deprecations on older series - a patch per log like this seems really expensive to maintain in the long run

That is a much better idea. Good suggestion, let's go with deprecation.

blackboxsw avatar May 07 '24 15:05 blackboxsw

I think this is just waiting on two integration test fixes. But, I'd like this in 24.3 as we are introducing a "breaking change" as far as warnings being emitted where there were not warnings before-hand and I'd like us to avoid trying to quickly stuff quilt patches into /noble for this across the SRU boundary for 24.2

blackboxsw avatar Jun 03 '24 23:06 blackboxsw

I think this is just waiting on two integration test fixes. But, I'd like this in 24.3 as we are introducing a "breaking change" as far as warnings being emitted where there were not warnings before-hand and I'd like us to avoid trying to quickly stuff quilt patches into /noble for this across the SRU boundary for 24.2

Resolving this involves either:

  1. ignore this warning always
  2. do this somehow without using verify_clean_log()

I don't think that we want 1) (it involves muddying verify_clean_log() with even more single-use ignores aka tech debt), but 2) requires implementing what was described here:

One idea that I envision for this helper is a matrix of platforms and series that allows us to specify exactly when an expected warning message that will be seen. This would be more precise than the current "ignore this pattern everywhere" paradigm that verify_clean_log() uses. That said, I'm not sure that I want to include that refactor in this PR, which has already grown sizeable.

In order to make verify_clean_log() truly be a general-purpose drop-in replacement for verify_clean_boot(), I think we need to grow the ability to track tracebacks in cloud-init status --format json (which would be useful for users anyways). Unfortunately, that is not something that will happen quickly, but also I don't think that this is a strong requirement today for implementing this test, since ignoring tracebacks is an oracle-only requirement.

I'll share what I've been working on shortly.

holmanb avatar Jun 04 '24 00:06 holmanb

@blackboxsw Last commit contains what I had in mind.

With this commit I can successfully run tests/integration_tests/test_upgrade.py::test_clean_boot_of_upgraded_package and get an instance with this log:

# grep WARN /var/log/cloud-init.log 
2024-06-04 01:07:26,672 - main.py[WARNING]: PID [2082] started cloud-init 'init-local'. Unsupported configuration: boot stage called outside of systemd
2024-06-04 01:07:27,299 - main.py[WARNING]: PID [2082] started cloud-init 'init'. Unsupported configuration: boot stage called outside of systemd
2024-06-04 01:07:27,956 - main.py[WARNING]: PID [2082] started cloud-init 'modules:config'. Unsupported configuration: boot stage called outside of systemd
2024-06-04 01:07:28,548 - main.py[WARNING]: PID [2082] started cloud-init 'modules:final'. Unsupported configuration: boot stage called outside of systemd

And it passes.

Thoughts?

holmanb avatar Jun 04 '24 01:06 holmanb

Thanks @holmanb . I've added two supplemental commits to your branch for reviiew

  • move from LOG.warning to util.deprecate() so we can better prompt any consumers of this feature that it will go away. It also allows us to reduce this deprectaion from WARNING level to INFO level on stable releases to avoid non-zero exit codes in the future once the feature spec'd at WIP brach #5411 lands
  • fixup verify_clean_boot to support ignore_(warnings|errrors|tracebacks)=True params just like verify_clean_log supports. Allowing individual tracebacks to be ignored or all tracebacks if ignore_tracebacks=True.

blackboxsw avatar Jun 19 '24 18:06 blackboxsw

@holmanb I think I resolved your last pass of review comments. Please take a look and make sure I didn't miss anything.

blackboxsw avatar Jun 24 '24 21:06 blackboxsw

@holmanb I think we want this #5209 in milestone 24.2 only if #5411 merged because #5491will give us an easy downstream/stable release mechanism to avoid warnings on "new" deprecation messages introduced across SRU boundary. If we think 5491 will be reviewable for milestone 24.2 today/tomorrow, I'll spend some time on that PR now and could provide some feedback here to get #5411 across the line.

blackboxsw avatar Jun 25 '24 17:06 blackboxsw

I think now that milestone 24.2 has been released. We can merge this pending your approval @holmanb

blackboxsw avatar Jul 04 '24 11:07 blackboxsw

I think now that milestone 24.2 has been released. We can merge this pending your approval @holmanb

Thanks @blackboxsw taking one last look at this now, I think that I'd like to see two minor changes before merge:

  1. Use the new util deprecation helper rather than ubuntu series to conditionally check for depreciation.

  2. Squash git history. I think that the modifications to verify_clean_boot probably deserve to be a separate (initial) commit, and the rest can probably be squashed into a single coauthored commit.

WDYT?

holmanb avatar Jul 04 '24 12:07 holmanb

I think now that milestone 24.2 has been released. We can merge this pending your approval @holmanb

Thanks @blackboxsw taking one last look at this now, I think that I'd like to see two minor changes before merge:

  1. Use the new util deprecation helper rather than ubuntu series to conditionally check for depreciation.
  2. Squash git history. I think that the modifications to verify_clean_boot probably deserve to be a separate (initial) commit, and the rest can probably be squashed into a single coauthored commit.

WDYT? @holmanb +1 on using deprecation helper in integration tests and agreed on squash merge intent.

blackboxsw avatar Jul 04 '24 13:07 blackboxsw