[docs]: Incomplete and misleading guidance on merging vendor-data
Issue #2979 described a problem where vendor-data and user-data are not merged. I've run into the same confusion, but didn't want to necro a closed, 2 year old issue, plus I have an additional request for clarification that wasn't in that issue.
Misleading documentation
In the Vendor-data page, it states:
User-supplied cloud-config is merged over cloud-config from vendor-data.
To me, the phrase "merged over" here implies a recursive merge, with user-data taking precedence over vendor-data. Eg., given
# user-data
map_a:
key_a: scalar from user-data
key_b:
subkey_u: hello user
list_a:
- item from user-data
top_level_scalar: foobar
# vendor-data
map_a:
key_a: scalar from vendor-data
key_b:
subkey_v: hello vendor
list_a:
- item from vendor-data
top_level_scalar: raboof
The combined config I expected would be
map_a:
key_a: scalar from user-data
key_b:
subkey_u: hello user
subkey_v: hello vendor
list_a:
- item from user-data
- item from vendor-data
top_level_scalar: foobar
It would be helpful if the documentation was more explicit that this is not how it works:
User-supplied cloud-config is merged, non-recursively, over cloud-config from vendor-data. When a key contains a collection (map or list) and is present in both, only the user-data collection is retained.
It might be helpful to further clarify that vendor-data is not suitable for platform-supplied defaults, but only for a platform-defined "bare" setup. The hint, "if the use of vendor-data is required for the instance to run, then vendor-data should not be used," doesn't quite land right if the use-case doesn't involve configuration that would prevent the instance from running if missing, eg. using write_file to create an /etc/motd with the vendor's ASCII art logo.
I can imagine a good reason for it to work the way that it does so I wouldn't call this a bug, but that only came after losing several hours wondering why it wasn't working the way I expected.
Incomplete information
While searching on the topic, I came across this post: https://discuss.linuxcontainers.org/t/how-to-merge-profiles-user-vendor-data/774/17:
Still don’t know in detail how all magic works, but when I add the
merge_howsection (https://cloudinit.readthedocs.io/en/latest/topics/merging.html) it works – without themerge_how, one file wins!
Notably, the Merging page makes no mention at all that merge_how acts on vendor-data.
I took a leap of faith that an old comment on an LXD forum might be onto something. So, now I'm testing merge_how, following the information in the "How to activate" section.
What I've observed it that some setting combinations have no effect, and (at least) one actually inverts the behavior of user-data overriding vendor-data, so the final, combined config is only vendor-data. At least that proves that merge_how does apply to vendor-data 🤷
This configuration causes the inversion:
# user-data
merge_how:
- name: list
settings: [append]
apt: ...
write_files:
- {file_u1: {...}}
- ...
# vendor-data
merge_how:
- name: list
settings: [append]
# There doesn't seem to be any difference if merge_how is given in both, one, or the other.
write_files:
- {file_v1: {...}}
# combined-config
write_files:
- {file_v1: {...}}
# The apt block and all write_files items from user-data are gone
Since there's no documentation saying whether merge_how applies to vendor-data in the first place, I can't say if this is expected behavior or a bug. It feels like a bug to me, though.
This configuration does absolutely nothing, the end result being identical to not setting merge_how at all:
# user-data
merge_how:
- name: list
settings: [append, recurse_list, recurse_dict]
- name: dict
settings: [no_replace, recurse_list, recurse_dict]
apt: ...
write_files:
- <file_u1>
- ...
# vendor-data
merge_how:
- name: list
settings: [append, recurse_list, recurse_dict]
- name: dict
settings: [no_replace, recurse_list, recurse_dict]
# Again, it doesn't seem to matter if this is in either one or both
write_files:
- <file_v1>
# combined-config
apt: ...
write_files:
- <file_u1>
- ... (but NOT file_v1)
I haven't gone through the whole matrix of possible merge setting combinations, but if I do happen to brute-force one that does what I want (or causes more weird behavior, like inversion) I'll be sure to update. I plan next to try varying the replace/no_replace options, since it's not clear if they mean replacing the entire collection or replacing conflicting items within the collection.
Any tips on getting the write_files list to merge would be appreciated! (Especially if they get added to the docs!)
I don't think that merging is very well-designed or implemented, and I'm not convinced that documentation is really the best way to fix it. See also https://github.com/canonical/cloud-init/issues/5866.
@holmanb , the poorly designed and/or poorly implemented features are exactly the kinds of features that require the most documentation. We're not just going to rip out merging. Even if we rolled out something better tomorrow, it's still going to be present in LTS's for several more years. Having better documentation for merging is sorely needed.
User-supplied cloud-config is merged over cloud-config from vendor-data.
I agree with the point made at the beginning of this issue, that the above statement can be interpreted multiple ways and can (and should) be better documented.
We're not just going to rip out merging.
@TheRealFalcon I didn't suggest ripping it out.
Having better documentation for merging is sorely needed.
This is a repeatedly filed user issue with no progress. We agree that something is needed. When I previously tried to improve the merging documentation, I struggled to find a way to make it coherent. Everything I came up with was effectively an English representation of the code, which was pretty garbled and incoherent to read. Given this personal experience, I still think that caution/construction tape might serve users better than both the current state of documentation and what I tried to come up with in the past. Maybe I'm wrong.