cloud-init icon indicating copy to clipboard operation
cloud-init copied to clipboard

NoCloud seedfrom http/s will not be processed before other datasources

Open kuwv opened this issue 1 month ago • 9 comments

Bug report

NoCloud using seedfrom is skipped even when prioritized first.

Steps to reproduce the problem

VMware (or other providers) work with the following user-data:

#cloud-config
datasource:
  NoCloud: {}

^^^ Will correctly load cloud-config from VMware customization specification dashboard.

datasource_list: [NoCloud, VMware, None]
datasource:
  NoCloud:
    seedfrom: https://example.com/cloud-init/

^^^ Will skip NoCloud ignoring valid cloud-config and kick VMware without any cloud-config.

Environment details

  • Cloud-init version:
  • Operating System Distribution:
  • Cloud provider, platform or installer type:

cloud-init logs

2025-12-04 21:58:55,802 - sources[DEBUG]: Searching for local data source in: ['DataSourceNoCloud', 'DataSourceVMware']
2025-12-04 21:58:55,802 - handlers.py[DEBUG]: start: init-local/search-NoCloud: searching for local data from DataSourceNoCloud
2025-12-04 21:58:55,802 - sources[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloud'>
2025-12-04 21:58:55,802 - sources[DEBUG]: Update datasource metadata and network config due to events: boot-new-instance
2025-12-04 21:58:55,803 - sources[DEBUG]: Detected DataSourceNoCloud
2025-12-04 21:58:55,916 - DataSourceNoCloud.py[INFO]: DataSourceNoCloud  only uses seeds starting with ('/', 'file://') - will try to use https://example.com/cloud-init/ in the network stage.
2025-12-04 21:58:55,916 - performance.py[DEBUG]: Getting metadata took 0.114 seconds
2025-12-04 21:58:55,916 - sources[DEBUG]: Datasource DataSourceNoCloud  not updated for events: boot-new-instance
2025-12-04 21:58:55,916 - handlers.py[DEBUG]: finish: init-local/search-NoCloud: SUCCESS: no local data found from DataSourceNoCloud
2025-12-04 21:58:55,916 - handlers.py[DEBUG]: start: init-local/search-VMware: searching for local data from DataSourceVMware
2025-12-04 21:58:55,916 - sources[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceVMware.DataSourceVMware'>
2025-12-04 21:58:55,917 - sources[DEBUG]: Update datasource metadata and network config due to events: boot-new-instance
2025-12-04 21:58:55,917 - sources[DEBUG]: Detected DataSourceVMware [seed=None]
2025-12-04 21:58:55,917 - dmi.py[DEBUG]: querying dmi data /sys/class/dmi/id/product_name
2025-12-04 21:58:55,918 - DataSourceVMware.py[DEBUG]: discovered vmware-rpctool: /usr/bin/vmware-rpctool
2025-12-04 21:58:55,918 - DataSourceVMware.py[DEBUG]: discovered vmtoolsd: /usr/bin/vmtoolsd
2025-12-04 21:58:55,918 - DataSourceVMware.py[INFO]: query guestinfo with /usr/bin/vmware-rpctool
2025-12-04 21:58:55,918 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key metadata
2025-12-04 21:58:55,918 - subp.py[DEBUG]: Running command ['/usr/bin/vmware-rpctool', 'info-get guestinfo.metadata'] with allowed return codes [0] (shell=False, capture=True)
2025-12-04 21:58:55,921 - DataSourceVMware.py[DEBUG]: No value found for key metadata
2025-12-04 21:58:55,922 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key userdata
2025-12-04 21:58:55,922 - subp.py[DEBUG]: Running command ['/usr/bin/vmware-rpctool', 'info-get guestinfo.userdata'] with allowed return codes [0] (shell=False, capture=True)
2025-12-04 21:58:55,925 - DataSourceVMware.py[DEBUG]: No value found for key userdata
2025-12-04 21:58:55,926 - DataSourceVMware.py[DEBUG]: Getting guestinfo value for key vendordata
2025-12-04 21:58:55,926 - subp.py[DEBUG]: Running command ['/usr/bin/vmware-rpctool', 'info-get guestinfo.vendordata'] with allowed return codes [0] (shell=False, capture=True)
2025-12-04 21:58:55,929 - DataSourceVMware.py[DEBUG]: No value found for key vendordata

kuwv avatar Dec 04 '25 22:12 kuwv

NoCloud using seedfrom is skipped even when prioritized first.

Runtime detection logic is used when multiple items are in the list. Force a single datasource by limiting the list to a single item.

holmanb avatar Dec 05 '25 02:12 holmanb

NoCloud using seedfrom is skipped even when prioritized first.

Runtime detection logic is used when multiple items are in the list. Force a single datasource by limiting the list to a single item.

Howdy @holmanb, I currently have two separate images that I'm attempting to combine into one.

kuwv avatar Dec 05 '25 16:12 kuwv

NoCloud using seedfrom is skipped even when prioritized first.

Runtime detection logic is used when multiple items are in the list. Force a single datasource by limiting the list to a single item.

Howdy @holmanb, I currently have two separate images that I'm attempting to combine into one.

I don't follow.

holmanb avatar Dec 05 '25 22:12 holmanb

NoCloud using seedfrom is skipped even when prioritized first.

Runtime detection logic is used when multiple items are in the list. Force a single datasource by limiting the list to a single item.

Howdy @holmanb, I currently have two separate images that I'm attempting to combine into one.

I don't follow.

In my understanding datasource can be configured from user-data but datasource_list cannot. The only way I can use a single datasource so far is through two separate images.

I want to standardize around one image for multiple teams.

kuwv avatar Dec 08 '25 01:12 kuwv

In my understanding datasource can be configured from user-data

Can it? What do you expect to happen?

holmanb avatar Dec 08 '25 07:12 holmanb

In my understanding datasource can be configured from user-data

Can it? What do you expect to happen?

Apologies, but I can't tell how much experience you have with cloud-init: https://cloudinit.readthedocs.io/en/latest/reference/examples.html#configure-data-sources

I have already tested this, reviewed the cloud-init source, and validated datasource_list / datasource availablility.

Previously, cloud-init supported NoCloud-Net to pull from a http/s sources instead of disk. This was configurable via user-data. Now it is not. IMO this is a regression unless there is a consistent approach to the previous logic used.

NOTE: VMware doesn't use metadata_urls as it uses vmtoolsd instead. Whereas the below should work when using two valid datasources:

#cloud-config
datasource:
  Ec2: {}
  NoCloud:
    seedfrom: https://example.com/cloud-init

This is inconsistent behavior within cloud-init.

kuwv avatar Dec 08 '25 17:12 kuwv

This was configurable via user-data. Now it is not.

What is the most recent version that this worked with? And which distro are you using?

holmanb avatar Dec 08 '25 19:12 holmanb

This was configurable via user-data. Now it is not.

What is the most recent version that this worked with? And which distro are you using?

I'm on RHEL9 and Ubuntu 22.

This was available in 21.1: https://cloudinit.readthedocs.io/en/21.1/topics/datasources/nocloud.html https://github.com/canonical/cloud-init/blob/d7bc16295eadec031a513439729f7e91c0b9f336/cloudinit/importer.py#L38

But, I haven't tested it. Very well could wait until the network step to run just as seen above.

Regardless of how NoCloud is functioning, the way the VMware works in regards to other modules are two different behaviors. You can zero out some of the other modules but not VMware.

If I understand correctly (not sure about all the modules behavior though)

datasource:
  Ec2: {}  # <- will zero out
  VMware: {}  # <- will still pull from `vmtoolsd` anyway

kuwv avatar Dec 08 '25 19:12 kuwv

Your goal is to have the IMDS provide user-data which tells cloud-init a URL where it can find user-data at a different location, right?

#include-file accomplishes this goal without messing with datasource detection - have you tried that?

holmanb avatar Dec 08 '25 19:12 holmanb

@holmanb I looked into this and while this is convenient for consolidating a few pieces this won't work unfortunately. This is mainly because we have two approaches for provisioning but both dynamically generate their content.

kuwv avatar Dec 16 '25 16:12 kuwv

@holmanb I looked into this and while this is convenient for consolidating a few pieces this won't work unfortunately. This is mainly because we have two approaches for provisioning but both dynamically generate their content.

How does dynamic generation prevent using #include?

holmanb avatar Dec 16 '25 17:12 holmanb

As I said #include is fine for some pieces but it's not a swiss-army knife and doesn't solve every problem. It works for commonalities like ca_certs but would be extremely unreliable if attempting to template to an additional server or host from the provisioning system itself.

Nutanix for example. https://www.nutanix.dev/lab_content/cloud-init-lab/contents/lab.html

If we replace #cloud-config with #include here we would still have to generate the #include files and published to that URL prior to boot according to our target end state. Would this be a separate web server or Nutanix itself?

Satellite 6 as another example: https://docs.redhat.com/en/documentation/red_hat_satellite/6.6/html/provisioning_guide/provisioning_virtual_machines_in_vmware_vsphere

Satellite6 uses ERB templates to assemble the user-data snippets. This would need to be exported to files hosted by the system for #include.

Neither system provides an endpoint for #include provisioning and both use separate templating capabilities to prep what would be used accordingly through #include. Nutanix files and ERB templates would need to be processed first.

Trying to solve this with #include is just flaky.

kuwv avatar Dec 16 '25 18:12 kuwv

Hey @kuwv, thanks for continuing the conversation. It seems like you understand your target environment and what you are attempting to accomplish, but I still don't. Cloud-init is used in many different ways, so I'd like to try to distill a clearer picture of what your specific cloud-init use case is.

It works for commonalities like ca_certs but would be extremely flaky if attempting to template to an additional server or host from the provisioning system itself.

What do you mean by flaky?

Nutanix for example. https://www.nutanix.dev/lab_content/cloud-init-lab/contents/lab.html

If we replace #cloud-config with #include here we would still have to generate the #include files and published to that URL prior to boot according to our target end state. Would this be a separate web server or Nutanix itself?

That falls outside of cloud-init's scope, but either would probably work. Does Nutanix use a web server to host the configurations?

Satellite 6 as another example: https://docs.redhat.com/en/documentation/red_hat_satellite/6.6/html/provisioning_guide/provisioning_virtual_machines_in_vmware_vsphere

Satellite6 uses ERB templates to assemble the end state user-data. This would need to be exported to files hosted by the system.

I'm not familiar with Satellite 6, but based on a quick skim of that page it seems like it requires modifying an image prior to booting an instance? I'm not sure whether this relates to you question.

Neither system provides an endpoint for #include provisioning and both use separate templating capabilities to prep what would be used accordingly through #include. Nutanix files and ERB templates would need to be processed first.

Trying to solve this with #include is just flaky.

Again, I'm not sure what you mean by flaky. It sounds like you don't think that this will solve your problem, but I'm still trying to decipher what that problem is. It is clear that you want to support multiple operating systems and use multiple different platforms, but beyond that I'm lost trying to figure out the specifics of your desired state. It sounds like you could host an internal webserver for your instances to get this config from. For testing I use something like python3 -m http.server from a directory with configurations, and for production grade I would suggest something like nginx or apache.

Going back to the title and your initial request, I see a fundamental problem with what I think you are asking for.

There is a chicken / egg problem with attempting to support the following configuration as runtime configuration:

datasource_list: [NoCloud, VMware, None]

In order to get this configuration from the metadata server, cloud-init must first decide which platform it is running on. The problem is that deciding which platform it is running on is done using datasource_list. You seem to want to provide a configuration to set datasource_list to a system that is already using datasource_list. Normally this setting is used to configure the image before it is booted.

holmanb avatar Dec 16 '25 20:12 holmanb

Flaky just means fragile and easily broken. This isn't specific to cloud-init. Implementing #include with Satellite6 or Nutanix would be flaky because they don't support it. One would have to build out a custom solution to support this workflow and maintain it.

The problem is:

  • VMware cannot be disabled like other datasources.
  • There is no possible configuration that NoCloud can successfully boot from the network when VMware is enabled.

This would be ideal:

# example image default
datasource_list: [VMware, NoCloud, None]
datasource:
  NoCloud: {}
#cloud-config
datasource:
  VMware: {}
  NoCloud:
    seedfrom: https://example.com/cloud-init/

OR

#cloud-config
datasource:
  VMware:
    enabled: false
  NoCloud:
    seedfrom: https://example.com/cloud-init/

kuwv avatar Dec 17 '25 14:12 kuwv

We need more information to get a clear understanding of the desired solution. Please address the following statements, questions, and requests.

Here are the facts that I understand:

  1. Your goal is to be able to build one image (per distro) that works on both Nutanix and Satelite.
  2. You also want to "zero out", or disable the VMware datasource.
  3. You currently build two separate images, one for Nutanix and one for Satellite 6.
  4. You dynamically generate configuration for provisioning on both Nutanix and Satellite 6.
  5. Something like this seemed to work before, but you are unable to confirm a specific version.

Let me know if any of the above are incorrect.

Here are some of my outstanding questions:

Regarding 4: How is a configuration unique between one instance and another?

Regarding 2: On which platform are you attempting to disable configuration from VMware (Nutanix or Satellite 6)?

Regarding 2: What configuration is being received from the VMware datasource that you do not want?

Regarding 2: Doesn't VMware provide a way to disable providing user-data on the server side?

Please provide a sample of user-data configuration for both Nutanix and Satellite 6. We also need logs (at a minimum /run/cloud/ds-identify.log) on both systems to understand cloud-init's detection.

holmanb avatar Dec 17 '25 19:12 holmanb