multipass icon indicating copy to clipboard operation
multipass copied to clipboard

[launch] allow cloud-init file injection

Open Saviq opened this issue 6 years ago • 18 comments

Some cloud-init modules (e.g. scripts) require files to be placed into the data source.

We need a --cloud-init-files or so that will allow the user to specify a file/directory tree to be included on the data source.

Saviq avatar Sep 11 '19 11:09 Saviq

would love to see this implemented. where does this currently fall on the release todo list @Saviq? i can see this being extremely useful for some of our use cases.

v1k0d3n avatar Dec 11 '19 17:12 v1k0d3n

Hey @v1k0d3n I don't have good news about this, either ;)

It's not something on our immediate roadmap, but it should be relatively easy to add if you wanted to contribute.

The cloud-init ISO is created here, in the daemon.cpp code - it would be easy to feed more files in.

Saviq avatar Dec 12 '19 09:12 Saviq

@Saviq more bad news!? agh. :) j/k

when I have some free time, I'll see if it's something I can knock out. it's at least helpful to know that it's not on product's immediate radar, but I'm surprised more users aren't asking for the feature. this feature seems like a great way to differentiate multipass vs. containerized use cases.

v1k0d3n avatar Dec 19 '19 20:12 v1k0d3n

EDIT: It appears that the make_cloud_init_image may have been deprecated in the newer code base, is this issue resolved?

I am trying to setup a plan to resolve this, here is what I have:

  1. Add position argument --cloud-init-files to the launch command.
  2. Update the help message.
  3. Add an argument with the scripts path (default empty string) value to: https://github.com/canonical/multipass/blob/b22ff56651c225d2e9cb219703a99f8ac6a21d0c/src/daemon/daemon.cpp#L147-L148

What should the record structure be for the scripts directory for the cloud-init YAML file? What structure/how should I be adding the scripts to the ISO?

surahman avatar Jun 05 '21 16:06 surahman

Hi @surahman, my current idea for this would be a --cloud-init-tree argument that would take a path as the argument and that would be the entire file structure injected into the ISO.

It would be a double-edged sword as I think we wouldn't override network-config or vendor-data if it existed, so users could potentially break some of Multipass assumptions. I was thinking we'd have a way to generate a "pristine" tree for users to modify, with comments to what each of the entries are responsible for.

Saviq avatar Jun 08 '21 13:06 Saviq

my current idea for this would be a --cloud-init-tree argument that would take a path as the argument and that would be the entire file structure injected into the ISO.

You would want to read in the cloud-init YAML file from disk?

It would be a double-edged sword as I think we wouldn't override network-config or vendor-data if it existed, so users could potentially break some of Multipass assumptions.

In this case, would you want to flag the YAML file as invalid with an error indicating it is overriding preset values? Or, assuming the interface to update these data structures is exposed, would you want to update them with a warning indicating preset values are being updated?

I was thinking we'd have a way to generate a "pristine" tree for users to modify, with comments to what each of the entries are responsible for.

Would you like to provide a template YAML file with the possible valid entries for initialization? Then the end-users would visit the documentation and grab the template. I am still very new to using cloud-init and am learning it now, but I do have experience with AWS instances and the user-data used to configure them - this is pretty similar.

surahman avatar Jun 08 '21 17:06 surahman

You would want to read in the cloud-init YAML file from disk?

We wouldn't even read it as YAML, just build the ISO with files from the folder given.

In this case, would you want to flag the YAML file as invalid with an error indicating it is overriding preset values? Or, assuming the interface to update these data structures is exposed, would you want to update them with a warning indicating preset values are being updated?

I don't want to go too deep, but rather give users the tool to do with as they please. So if they provide vendor-data, we would accept it as-is.

Would you like to provide a template YAML file with the possible valid entries for initialization? Then the end-users would visit the documentation and grab the template. I am still very new to using cloud-init and am learning it now, but I do have experience with AWS instances and the user-data used to configure them - this is pretty similar.

Our config is dynamic depending on how the instance is launched. So we'd need a special multipass command that would take the same arguments launch does, but rather than launch an instance, it would populate a folder with the metadata as generated by Multipass and allow for editing. Another option would be to have a --edit-cloud-init option or so to launch, which would generate the tree and then ask the user to modify and let the launch continue then.

Saviq avatar Jun 08 '21 18:06 Saviq

Cool, let's build it 😉. Need to do some research and learning but I am game.

surahman avatar Jun 08 '21 19:06 surahman

--cloud-init-tree

I think this would need to be a PR on its own, but it seems simple enough. It should be possible to recursively map a directory using QDir and generate a navigable list of files. I would start by mocking the QDir routines need for these operations. I found the ISO generation code in this routine: https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/src/platform/backends/shared/base_virtual_machine_factory.cpp#L27

The only problem is I do not know what the actual file structure of the ISO would be, an example would be helpful. I have found the VM description, which is what I assume are the only items that need to be populated from the file structure: https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/include/multipass/virtual_machine_description.h#L35-L53

I do not think I will need to edit anything in https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/src/iso/cloud_init_iso.cpp


Customize cloud-init I think this should be a separate PR. Once the above command is completed, I feel I would have a better understanding of the expected ISO file structure. I do not think it will be too difficult to generate the files and directories using QFile and QDir at that point. I feel that it would be better to populate a folder and let the end-user use an editor (GUI or cmdl) to edit - let them pick their poison.

surahman avatar Jun 09 '21 03:06 surahman

--cloud-init-tree

I think this would need to be a PR on its own, but it seems simple enough.

Totally a separate PR, yes. There is a caveat. We've been taking shortcuts in terms of client and daemon sitting on the same machine, and having access to the same files. Ideally we'd not do that here and instead stream the tree from the client to the daemon through our RPC. This would ensure that client and daemon sitting on different machines would be catered for.

It should be possible to recursively map a directory using QDir and generate a navigable list of files. I would start by mocking the QDir routines need for these operations. I found the ISO generation code in this routine:

The only problem is I do not know what the actual file structure of the ISO would be, an example would be helpful. I have found the VM description, which is what I assume are the only items that need to be populated from the file structure:

You can mount /dev/cdrom in a Multipass instance to see what we generate at the moment. cloud-init docs have more details.

I do not think I will need to edit anything in https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/src/iso/cloud_init_iso.cpp

Indeed, that's generic enough. Probably shouldn't even say "cloud-init" there.

Customize cloud-init I think this should be a separate PR. Once the above command is completed, I feel I would have a better understanding of the expected ISO file structure. I do not think it will be too difficult to generate the files and directories using QFile and QDir at that point. I feel that it would be better to populate a folder and let the end-user use an editor (GUI or cmdl) to edit - let them pick their poison.

Indeed, separate PR, with a similar caveat as above - it would ideally be streamed to the client over RPC. And yeah, I envision a "You can now edit the cloud-init data under . Press [Enter] when ready". After that it would behave exactly the same as if they passed --cloud-init-tree in the first place.

Saviq avatar Jun 09 '21 05:06 Saviq

I am on it :construction_worker_man:. I am going to run some tests with QT before starting work to figure out the file ops before I start to flesh things out on the daemon. I will figure out getting the data to the client via RPC.

surahman avatar Jun 09 '21 13:06 surahman

@surahman that's awesome, you deserve a 🥇.

Will you please write up a summary (a spec of sorts) of the work you're planning to do? This way we can iron out any kinks as early as possible.

Saviq avatar Jun 09 '21 13:06 Saviq

@surahman that's awesome, you deserve a 🥇. Happy to contribute :wink:

Will you please write up a summary (a spec of sorts) of the work you're planning to do? This way we can iron out any kinks as early as possible.

I will get this done as soon as possible and before I write any actual code.

surahman avatar Jun 09 '21 13:06 surahman

RFC: Expected plan of action:

Feature: --cloud-init-tree

Enhancement of the launch command on the client which takes a path as a parameter. The path points to a directory that contains a nested directory-file structure that will comprise the ISO's image to use for cloud-init.

CLI

Add the positional argument cloud-init-tree to the parse_args routine in launch. Ensure there is a value for the path passed and that cloud-init is not set.

  • [x] user-data: YAML file, in ISO root. RPC LauchRequest.cloud_init_user_data currently exists for IPC.
  • [x] meta-data: YAML file, in ISO root. No RPC LaunchRequest data member.
  • [x] vendor-data: YAML file, in ISO root. No RPC LaunchRequest data member.
  • [x] network-config: YAML file, in ISO root. No RPC LaunchRequest data member.
  • [ ] user-data script: Flat file, bash script, no standardised name. Currently not supported.
  • [ ] include: Flat file, no standardised name. Currently not supported.
  • [ ] cloud config data: YAML file, no standardised name. Currently not supported.
  • [ ] upstart job: Flat file, no standardised name, located in/destined for /etc/init. Currently not supported.
  • [ ] boothook: Flat file, no standardised name, located in/destined for /var/lib/cloud. Currently not supported.
  • [ ] part handler: Flat file, Python script, no standardised name, located in/destined for /var/lib/cloud/data. Currently not supported.

The checked items above should be able to reuse code in the current setup code in platform using mp::BaseVirtualMachineFactory::configure as a template to generate the ISO. The unchecked items are a little tricky, and it might be better to get the currently supported items working first. Most of the unsupported items are flat files and can be transferred without conversion as bytes arrays (max size 232 bytes).

RPC LaunchRequest

Transferring data for some of the checked items above to the daemon will require extending the LauchRequest RPC. The unchecked items above will require entending the ISO generation in BaseVirtualMachineFactory.

We could use a map<string, bytes> in gRPC to store the directory/path/to/filename as the key and file contents as the value. This data would then be unpacked and populated on the platform backend into the ISO. This would be dynamic with regards to the files available.

Mocking fileops

If we are sticking to the currently supported user-data, meta-data, vendor-data, and network-config, I do not think we will need to be extending the mocking framework for file ops. The above file names would be placed in a vector and cycled through with attempts to open. Upon open failure (not found or cannot open) they will be skipped.

If there is a need to recursively explore the directory tree we will need mocks for QDirIterator methods to list the contents and check for files.

Questions

Why convert any of these files to YAML, etc. after reading? Why not just transport the file contents read in as a bytes array, which will retain the formatting of the file, and deliver it via RPC to the backend. Just dump the file contents to the ISO as they are provided. The onus of providing a correctly formatted file is on the end-user.

surahman avatar Jun 10 '21 15:06 surahman

Hi @surahman, sorry for the late reply, needed to digest what you wrote :)

All in all, I don't think you should go into that much detail about the contents of the ISO. I mean, it doesn't matter for the implementation. We should just take the contents of the tree as-is and only add the meta, network and vendor files if they were not provided.

I'd look into gRPC streaming mode for the file transfer, as some of those may well be large, so we need to transfer them in chunks.

Why convert any of these files to YAML, etc. after reading?

We did that so that we could inject extra bits necessary for Multipass to function. We shouldn't do that in this case.

Saviq avatar Jun 22 '21 15:06 Saviq

@Saviq thanks, I will get started on this as time permits.

surahman avatar Jun 22 '21 16:06 surahman

@Saviq in the event that there is no provided meta, network, or vendor data you would like to provide defaults - would you like this data hardcoded or as bundled files? If defaults are provided as files it would allow the end-user to customize them once and use many, but if hardcoded this would not be possible. Where should I be placing the code/files for the defaults, is there an appropriate header or location for these?

surahman avatar Jun 22 '21 18:06 surahman

@Saviq in the event that there is no provided meta, network, or vendor data you would like to provide defaults

Those exist today, and are generated dynamically for each instance. The same needs to happen when those don't exist in --cloud-init-tree. So if the files are missing from the tree, we need to add them on the fly.

Saviq avatar Jun 22 '21 18:06 Saviq