ansible-role-proxmox feature: add debian 12 bookworm and proxmox 8 compatibility

resolves: #226

Jun 26 '23 15:06 lexxxel

Have you actually tested it so that nothing using the API or the cluster management breaks? I'll do it in a bit myself, and try to review it with those changes.

Aug 29 '23 14:08 gilou

Have you actually tested it so that nothing using the API or the cluster management breaks? I'll do it in a bit myself, and try to review it with those changes.

I use the modified role on my 3 node cluster. Until now I have not found any issue managing lxc or VMs

Aug 29 '23 14:08 lexxxel

I'm testing in Vagrant (the Vagrantfile could be upgraded a bit, but maybe we want to test on older versions ;)), and will deploy a few scenarios... First thing, you might want to have the ceph version depend on proxmox' version. Because it's getting annoying, I'd suggest having the ceph release name as a dictionnary, with the keys being Debian ansible_distribution_release values, and the values, Ceph. Something along the lines off:

pve_ceph_version:
  buster: nautilus
  bullseye: pacific
  bookworm: quincy
pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-{{ pve_ceph_version[ansible_distribution_release] }} {{ ansible_distribution_release }} no-subscription"

in defaults/main.yaml

It has another benefit, one can decide they want another version for ceph, and can then override it without overriding the whole line, though I doubt it's actually useful.

Aug 29 '23 16:08 gilou

Ahh, let me clarify, I use btrfs and zfs for storage. I do not have experience with all the other storage providers.

Aug 29 '23 16:08 lexxxel

ah, there's more (I fixed the ceph- prefix in my comment): seems like Proxmox switch from using "main" as the channel to no-subscription… I fixed it, I don't know if it would work as is for 6.x/7.x… Also the doc mentions pveceph install to set up the repos.

I'm not deploying ceph using ansible myself, but I'm just playing the default test playbook that enables it. It might be that nobody cares…

Aug 29 '23 16:08 gilou

Have you actually tested it so that nothing using the API or the cluster management breaks?

Yeah, I was personally waiting to test it out but am busy as of late. The Vagrant testing suite also needed to be updated but I figured I'd do that myself.

I'm not deploying ceph using ansible myself, but I'm just playing the default test playbook that enables it. It might be that nobody cares…

People definitely do care. Ceph still needs to be functional on update. #215 is supposed to be an update to the Ceph repos but I'd left a comment there regarding behaviour on existing setups. Also, the role does what pveceph install does under the hood except in a more controlled fashion afaik.

Aug 29 '23 16:08 lae

Hey, glad to have you around!

Yeah, I'm getting lazy about the channel name switching from main to no-subscription, and I'm not sure when that happened either (7 ? 8 ?).

And as for vagrant, I'll probably start another conversation, I like how you set it up, and it's probably not too hard to update it, and at some point, automate that ;)

Aug 29 '23 17:08 gilou

I've tested the proposed PR on Vagrant (Debian bookworm from bento as the base) and on a single node cluster (deployed as Proxmox), it seems to be working fine. I haven't tested ZFS nor ceph, just the basics, but it seems to be looking OK.

Aug 29 '23 17:08 gilou

I'm not experiencing this on a "real setup", mostly because I don't provision ceph using the role, but here what's a test in VMs does using the role in Vagrant:

TASK [lae.proxmox : Validate and compress new crushmap] ************************
fatal: [pve-1]: FAILED! => {"changed": true, "cmd": ["crushtool", "-c", "crush_map_decompressed", "-o", "new_crush_map_compressed"], "delta": "0:00:00.021260", "end": "2023-08-29 21:25:06.169639", "msg": "non-zero return code", "rc": 1, "start": "2023-08-29 21:25:06.148379", "stderr": "crush_map_decompressed:44 error: parse error at ''", "stderr_lines": ["crush_map_decompressed:44 error: parse error at ''"], "stdout": "", "stdout_lines": []}

I'm not sure I can debug that easily, given the complexity of the regex, but I may try…

Also, and that's probably a change in the API, the tests fail using Vagrant, as it seems to be forbidden to have a "no-content-dir", that is not define a valid content attribute. This is the error I get:

create storage failed: unknown vtype 'none'

This is a non-issue, I'm pretty sure it makes no sense to try and create that anymore.

Aug 30 '23 00:08 gilou

FYI for anyone coming to this issue and wanting to use this immediately, I'll suggest what I've suggested a few times over the years. You can install this PR as a drop-in replacement for lae.proxmox via ansible-galaxy:

ansible-galaxy install https://github.com/lexxxel/ansible-role-netbox/archive/feature/add_bookworm_and_debian_12_compatibility.tar.gz,pr-230,lae.proxmox --force

(which you should upgrade from once this is merged and a new release has been cut)

Aug 30 '23 00:08 lae

What I do usually is symlink ~/.ansible/roles/lae.proxmox to my working dir, so I can decide on what branch / git magic I'm operating.

Aug 30 '23 00:08 gilou

What I do usually is symlink ~/.ansible/roles/lae.proxmox to my working dir, so I can decide on what branch / git magic I'm operating.

Yeah, that's what I do in my environment too but that method's really only useful for contributors of the role lol (well, not to say others can't do it either, but for one-off situations installing the git-archive tarball like above is a bit less maintenance)

Aug 30 '23 01:08 lae

I'm not experiencing this on a "real setup", mostly because I don't provision ceph using the role, but here what's a test in VMs does using the role in Vagrant:

TASK [lae.proxmox : Validate and compress new crushmap] ************************
fatal: [pve-1]: FAILED! => {"changed": true, "cmd": ["crushtool", "-c", "crush_map_decompressed", "-o", "new_crush_map_compressed"], "delta": "0:00:00.021260", "end": "2023-08-29 21:25:06.169639", "msg": "non-zero return code", "rc": 1, "start": "2023-08-29 21:25:06.148379", "stderr": "crush_map_decompressed:44 error: parse error at ''", "stderr_lines": ["crush_map_decompressed:44 error: parse error at ''"], "stdout": "", "stdout_lines": []}

OK, found the culprit, min_size and max_size are no longer valid in quincy crush maps... So if we want to keep that available in tasks/ceph.yml for former versions, we'll need to add a condition there for the big regex replace to edit the crush rules…

EDIT: fixing that is easy, as we can simply not write the min/max attributes anymore, but maybe we want an upgrade path there…

Aug 30 '23 01:08 gilou

Something along the lines off:

pve_ceph_version:
  buster: nautilus
  bullseye: pacific
  bookworm: quincy
pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-{{ pve_ceph_version[ansible_distribution_release] }} {{ ansible_distribution_release }} no-subscription"

Actually, I didn't notice we already have that logic in load_variables.yml, so we can simply set the appropriate ceph version in vars/version.yml, and let the user overwrite it if needed…

Aug 30 '23 02:08 gilou

A note for users with nfs storage:

I am using a fresh PVE 8 installation (vagrant based) with nfs storage. My playbook fails when executing the task 'Configure Proxmox Storage'. Debugging the issue showed that the creation of the first nfs mount via pvesh on the shell lead to the following result:

$ pvesh create storage --storage nas-downloads --output=json --content=iso,vztmpl --export=/volume1/homeserver-shared --server=10.10.10.10 --type=nfs
Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /lib/systemd/system/rpc-statd.service.
{"storage":"nas-data","type":"nfs"}

echo $? gives 0 but output is interpreted as error by lae.proxmox/module_utils/pvesh.py#run_command(). I "fixed" this by starting rpc-statd service prior executing lae.proxmox:

- name: Start rpc-statd.service
  ansible.builtin.service:
    name: rpc-statd
    enabled: true
    state: started

Dec 04 '23 16:12 mrtwnklr

Something along the lines off:
pve_ceph_version:
  buster: nautilus
  bullseye: pacific
  bookworm: quincy
pve_ceph_repository_line: "deb http://download.proxmox.com/debian/ceph-{{ pve_ceph_version[ansible_distribution_release] }} {{ ansible_distribution_release }} no-subscription"
Actually, I didn't notice we already have that logic in load_variables.yml, so we can simply set the appropriate ceph version in vars/version.yml, and let the user overwrite it if needed…

That's a nice proposal to update the defaults with pve_ceph_version. AFAIK, it's not possible for a user to replace a variable defined in vars/, so I would stick with a sane default and allow to change the Ceph release. Our team is currently testing the upgrade from 7.x to 8.x and we had to upgrade ceph to quincy first. We defined a new value for pve_ceph_repository_line, but it would have been more elegant to define pve_ceph_version['bullseye'] = 'quincy'.

@lexxxel do you think you could add this to the PR?

Feb 20 '24 23:02 btravouillon

With Proxmox VE EOL: 2024-07 coming soon, is there any plan to merge this PR and release a new version of the role?

FWIW, our team has been able to use this PR to upgrade our lab cluster (5 nodes) from PVE 7.4 to 8.1 and Ceph to 17.2 with the role.

Mar 13 '24 15:03 btravouillon

sorry for taking so long on this.

Mar 15 '24 14:03 lae

sorry for taking so long on this.

You're here now, that's all that matters! ❤️

Mar 15 '24 15:03 tbjers

ansible-role-proxmox ansible-role-proxmox copied to clipboard

feature: add debian 12 bookworm and proxmox 8 compatibility

ansible-role-proxmox
ansible-role-proxmox copied to clipboard