ansible-role-proxmox
ansible-role-proxmox copied to clipboard
feature: add debian 12 bookworm and proxmox 8 compatibility
resolves: #226
Have you actually tested it so that nothing using the API or the cluster management breaks? I'll do it in a bit myself, and try to review it with those changes.
Have you actually tested it so that nothing using the API or the cluster management breaks? I'll do it in a bit myself, and try to review it with those changes.
I use the modified role on my 3 node cluster. Until now I have not found any issue managing lxc or VMs
I'm testing in Vagrant (the Vagrantfile could be upgraded a bit, but maybe we want to test on older versions ;)), and will deploy a few scenarios... First thing, you might want to have the ceph version depend on proxmox' version. Because it's getting annoying, I'd suggest having the ceph release name as a dictionnary, with the keys being Debian ansible_distribution_release values, and the values, Ceph. Something along the lines off:
pve_ceph_version:
buster: nautilus
bullseye: pacific
bookworm: quincy
pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-{{ pve_ceph_version[ansible_distribution_release] }} {{ ansible_distribution_release }} no-subscription"
in defaults/main.yaml
It has another benefit, one can decide they want another version for ceph, and can then override it without overriding the whole line, though I doubt it's actually useful.
Ahh, let me clarify, I use btrfs and zfs for storage. I do not have experience with all the other storage providers.
ah, there's more (I fixed the ceph- prefix in my comment): seems like Proxmox switch from using "main" as the channel to no-subscription… I fixed it, I don't know if it would work as is for 6.x/7.x… Also the doc mentions pveceph install
to set up the repos.
I'm not deploying ceph using ansible myself, but I'm just playing the default test playbook that enables it. It might be that nobody cares…
Have you actually tested it so that nothing using the API or the cluster management breaks?
Yeah, I was personally waiting to test it out but am busy as of late. The Vagrant testing suite also needed to be updated but I figured I'd do that myself.
I'm not deploying ceph using ansible myself, but I'm just playing the default test playbook that enables it. It might be that nobody cares…
People definitely do care. Ceph still needs to be functional on update. #215 is supposed to be an update to the Ceph repos but I'd left a comment there regarding behaviour on existing setups. Also, the role does what pveceph install
does under the hood except in a more controlled fashion afaik.
Hey, glad to have you around!
Yeah, I'm getting lazy about the channel name switching from main to no-subscription, and I'm not sure when that happened either (7 ? 8 ?).
And as for vagrant, I'll probably start another conversation, I like how you set it up, and it's probably not too hard to update it, and at some point, automate that ;)
I've tested the proposed PR on Vagrant (Debian bookworm from bento as the base) and on a single node cluster (deployed as Proxmox), it seems to be working fine. I haven't tested ZFS nor ceph, just the basics, but it seems to be looking OK.
I'm not experiencing this on a "real setup", mostly because I don't provision ceph using the role, but here what's a test in VMs does using the role in Vagrant:
TASK [lae.proxmox : Validate and compress new crushmap] ************************
fatal: [pve-1]: FAILED! => {"changed": true, "cmd": ["crushtool", "-c", "crush_map_decompressed", "-o", "new_crush_map_compressed"], "delta": "0:00:00.021260", "end": "2023-08-29 21:25:06.169639", "msg": "non-zero return code", "rc": 1, "start": "2023-08-29 21:25:06.148379", "stderr": "crush_map_decompressed:44 error: parse error at ''", "stderr_lines": ["crush_map_decompressed:44 error: parse error at ''"], "stdout": "", "stdout_lines": []}
I'm not sure I can debug that easily, given the complexity of the regex, but I may try…
Also, and that's probably a change in the API, the tests fail using Vagrant, as it seems to be forbidden to have a "no-content-dir", that is not define a valid content attribute. This is the error I get:
create storage failed: unknown vtype 'none'
This is a non-issue, I'm pretty sure it makes no sense to try and create that anymore.
FYI for anyone coming to this issue and wanting to use this immediately, I'll suggest what I've suggested a few times over the years. You can install this PR as a drop-in replacement for lae.proxmox
via ansible-galaxy
:
ansible-galaxy install https://github.com/lexxxel/ansible-role-netbox/archive/feature/add_bookworm_and_debian_12_compatibility.tar.gz,pr-230,lae.proxmox --force
(which you should upgrade from once this is merged and a new release has been cut)
What I do usually is symlink ~/.ansible/roles/lae.proxmox
to my working dir, so I can decide on what branch / git magic I'm operating.
What I do usually is symlink
~/.ansible/roles/lae.proxmox
to my working dir, so I can decide on what branch / git magic I'm operating.
Yeah, that's what I do in my environment too but that method's really only useful for contributors of the role lol (well, not to say others can't do it either, but for one-off situations installing the git-archive
tarball like above is a bit less maintenance)
I'm not experiencing this on a "real setup", mostly because I don't provision ceph using the role, but here what's a test in VMs does using the role in Vagrant:
TASK [lae.proxmox : Validate and compress new crushmap] ************************ fatal: [pve-1]: FAILED! => {"changed": true, "cmd": ["crushtool", "-c", "crush_map_decompressed", "-o", "new_crush_map_compressed"], "delta": "0:00:00.021260", "end": "2023-08-29 21:25:06.169639", "msg": "non-zero return code", "rc": 1, "start": "2023-08-29 21:25:06.148379", "stderr": "crush_map_decompressed:44 error: parse error at ''", "stderr_lines": ["crush_map_decompressed:44 error: parse error at ''"], "stdout": "", "stdout_lines": []}
OK, found the culprit, min_size and max_size are no longer valid in quincy crush maps... So if we want to keep that available in tasks/ceph.yml for former versions, we'll need to add a condition there for the big regex replace to edit the crush rules…
EDIT: fixing that is easy, as we can simply not write the min/max attributes anymore, but maybe we want an upgrade path there…
Something along the lines off:
pve_ceph_version: buster: nautilus bullseye: pacific bookworm: quincy pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-{{ pve_ceph_version[ansible_distribution_release] }} {{ ansible_distribution_release }} no-subscription"
Actually, I didn't notice we already have that logic in load_variables.yml, so we can simply set the appropriate ceph version in vars/version.yml, and let the user overwrite it if needed…
A note for users with nfs storage:
I am using a fresh PVE 8 installation (vagrant based) with nfs storage. My playbook fails when executing the task 'Configure Proxmox Storage'. Debugging the issue showed that the creation of the first nfs mount via pvesh
on the shell lead to the following result:
$ pvesh create storage --storage nas-downloads --output=json --content=iso,vztmpl --export=/volume1/homeserver-shared --server=10.10.10.10 --type=nfs
Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /lib/systemd/system/rpc-statd.service.
{"storage":"nas-data","type":"nfs"}
echo $?
gives 0
but output is interpreted as error by lae.proxmox/module_utils/pvesh.py#run_command()
. I "fixed" this by starting rpc-statd service prior executing lae.proxmox
:
- name: Start rpc-statd.service
ansible.builtin.service:
name: rpc-statd
enabled: true
state: started
Something along the lines off:
pve_ceph_version: buster: nautilus bullseye: pacific bookworm: quincy pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-{{ pve_ceph_version[ansible_distribution_release] }} {{ ansible_distribution_release }} no-subscription"
Actually, I didn't notice we already have that logic in load_variables.yml, so we can simply set the appropriate ceph version in vars/version.yml, and let the user overwrite it if needed…
That's a nice proposal to update the defaults with pve_ceph_version
. AFAIK, it's not possible for a user to replace a variable defined in vars/
, so I would stick with a sane default and allow to change the Ceph release. Our team is currently testing the upgrade from 7.x to 8.x and we had to upgrade ceph to quincy first. We defined a new value for pve_ceph_repository_line
, but it would have been more elegant to define pve_ceph_version['bullseye'] = 'quincy'
.
@lexxxel do you think you could add this to the PR?
With Proxmox VE EOL: 2024-07 coming soon, is there any plan to merge this PR and release a new version of the role?
FWIW, our team has been able to use this PR to upgrade our lab cluster (5 nodes) from PVE 7.4 to 8.1 and Ceph to 17.2 with the role.
sorry for taking so long on this.
sorry for taking so long on this.
You're here now, that's all that matters! ❤️