bottlerocket-admin-container icon indicating copy to clipboard operation
bottlerocket-admin-container copied to clipboard

SSH not working when not running in superpowered mode

Open markusboehme opened this issue 2 years ago • 5 comments

Image I'm using: The official Bottlerocket v0.9.3 admin container image on a metal-dev variant on aarch64.

Issue or Feature Request: When not running the admin container in superpowered mode (user data superpowered = false), I cannot connect to it via SSH.

Initial Notes:

  • SSH access without being superpowered used to work in v0.8.0, before the move to systemd.
  • Looking at the host via the serial console, [email protected] is reported as active/running and the unit's journal contains no errors. However, sshd is nowhere to be found in the list of processes.
  • Attaching strace to systemd, it appears comatose (ppoll(NULL, 0, NULL, NULL, 0); waiting on an empty set of file descriptors without a timeout; unless it's an idiom I'm not familiar with, this seems unhealthy).
  • I can enter the admin container via the control container and start sshd manually in a session there. I can then connect via SSH as I expected to.

markusboehme avatar Feb 02 '23 10:02 markusboehme

The specific issue of sshd not running since the move to systemd aside, the usefulness of an admin container that's not superpowered appears to be very limited. There is no way to interact with the host filesystem or host processes, i.e. there is not much "admin" left in the admin container. What do others think about documenting and codifying the requirement for the admin container to be superpowered? The current Bottlerocket documentation only generically talks about the ability of host containers to be either superpowered or not.

markusboehme avatar Feb 08 '23 14:02 markusboehme

What do others think about documenting and codifying the requirement for the admin container to be superpowered?

That makes sense to me. We could also have the entrypoint script sleep forever if we detect lack of superpowers, for example if /.bottlerocket/rootfs doesn't exist.

bcressey avatar Feb 08 '23 17:02 bcressey

I could see someone using a not-superpowered admin container in non-AWS variants. The admin container could be used in lieu of the control container to interact with the API over SSH (instead of hybrid-activated SSM).

jpculp avatar Feb 08 '23 17:02 jpculp

I could see someone using a not-superpowered admin container in non-AWS variants. The admin container could be used in lieu of the control container to interact with the API over SSH (instead of hybrid-activated SSM).

I see, this is one way the admin container might be used in a non-superpowered way. Thanks for bringing it up! Incidentally, mentioning non-AWS variants made me notice serial console access for metal variants also won't work without access to the host filesystem.

Given that non-superpowered admin containers haven't worked properly for some time already without any bug reports, and are of limited usefulness when they do, would you accept a PR restricting the admin container to superpowered mode only? This would unblock work for cgroup v2 in #76 (which is how I noticed this issue to begin with). Should there be requests for using the admin container without superpowers in the future, this decision could be revisited when upgrading the base image to a distro that natively supports cgroup v2.

markusboehme avatar Feb 09 '23 15:02 markusboehme

If the use-case has been broken since the introduction of serial/systemd, I don't think we necessarily have to block #76. If you wanted to add an escape hatch for non-superpowered folks, you could leverage the host container user data to trigger conditional logic where systemd, cgroupsv2, and serial console are disabled and sshd is started the old-fashioned way. The user data could be something like: {"systemd":{"enabled":false}}. Absence of user-data should be treated as "enabled": true.

jpculp avatar Feb 13 '23 22:02 jpculp