crc [Epic] Provide an AMI image of the VM used by CRC

Assigned: @gbraad, @praveenkumar

[ ] disk layout (can conversion work or is install from the current base image necesary)
[ ] handle cluster name change
- Note: internal routes are static
[ ] cloud-init settings provisioning

[Spike] [Stretch goal] [important]

Jul 15 '20 07:07 gbraad

As also pointed out by @bbrowning a straight conversion is not possible due to the way AWS uses the a simpler disk layout than what the curent Libvirt-based images uses (partitioned)

Jul 21 '20 03:07 gbraad

Some of the challenges which we will face for AMI side as pointed out by @bbrowning

The partitioning layout (among other things) used by RHCOS isn't supported for import by AWS. Both the EFI bootloader partition and the GPT partitioning scheme integral to CoreOS's disk layout (https://coreos.com/os/docs/latest/sdk-disk-partitions.html inspired from the x86 EFI layout at http://www.chromium.org/chromium-os/chromiumos-design-docs/disk-format) is integral to how CoreOS works.
If running the single node directly on the aws then several places where OCP expects to continue controlling resources in the AWS account that will be hard to decouple in order to create a generic AMI image.
A viable path forward here is a UPI install on AWS of a single node cluster where no cloud credentials are given to any pods in the bootstrap or master nodes. Once that single node cluster is up without any AWS-specific integrations enabled inside of it, it should be possible to then save it off as a reusable AMI in a spirit similar to what the createdisk.sh script in code-ready/snc does for libvirt.
Something will still need to do the equivalent of crc start to configure everything inside the AMI when a user starts an instance of it. And, because these are not running on a local laptop, additional steps will need to be taken to rotate the ssh keys, kube admin password, and all certificate / certificate authorities (or at least kubeadmin client-cert auth blocked) so that the clusters spun up are not easily "rooted".
Never expose a CRC instance to the internet at large without a reverse proxy sitting in front of its API server because all CRC installs share the same set of certificate authorities which means any API server client cert valid for one is valid for all. Thus, you have to not use TLS client certs and explicitly not allow it, which means putting a TLS-terminating reverse proxy in front of the API server.

Aug 04 '20 04:08 praveenkumar

I think that nicely sums up all the challenges I'm aware of - thanks @praveenkumar . One thing I have not looked into is how the existing RHCOS AMIs get uploaded to begin with. How are they building the AMI to get around the EC2 limitations around importing VM images?

Aug 04 '20 18:08 bbrowning

@ashcrow Can you perhaps explain how the AMI is generated to deal with the EC2 limitations with regards to the MBR/GPT?

Aug 05 '20 08:08 gbraad

I think there may be some misunderstanding. The link referenced is for Container Linux which is not FCOS or RHCOS.

For RHCOS the AMI is generated off the qemu image and uploaded via cosa's internal ore. Before we upload it we do convert the qcow into a vmdk with specific options.

Aug 05 '20 16:08 ashcrow

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Oct 04 '20 17:10 stale[bot]

@guillaumerose these detail the needed changes

Oct 19 '20 06:10 gbraad

@anjannath Please have a look at some of those issues.

Apr 19 '22 02:04 gbraad

I think now this is the scope of crc-cloud project so closing it from here.

Mar 16 '23 10:03 praveenkumar