ceph-config: fix calculation of `num_osds`
The number of OSDs defined by the lvm_volumes variable is added to num_osds in task Count number of osds for lvm scenario. Therefore theses devices must not be counted in task
Set_fact num_osds (add existing osds).
There are currently three problems with the existing approach:
- Bluestore DB and WAL devices are counted as OSDs
-
lvm_volumessupports a second notation to directly specify logical volumes instead of devices when thedata_vgkey exists. This scenario is not yet accounted for. - The
differencefilter used to remove devices fromlvm_volumesreturns a list of unique elements, thus not accounting for multiple OSDs on a single device
The first problem is solved by filtering the list of logical volumes for devices used as type block.
For the second and third problem lists are created from lvm_volumes containing either paths to devices or logical volumes devices. For the second problem the output of ceph-volume is simply filtered for lv_paths appearing in the list of logical volume devices described above.
To solve the third problem the remaining OSDs in the output are compiled into a list of their used devices, which is then filtered for devices appearing in the list of devices from lvm_volumes.
Fixes: https://github.com/ceph/ceph-ansible/issues/7435
jenkins test centos-container-external_clients
jenkins test centos-container-rbdmirror
@clwluvw that's probably the kind of task which deserves to be converted into a module
@janhorstmann I might be missing something but with the current implementation :
inventory
[osds]
osd0 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]"
osd1 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group'}]" dmcrypt=true
osd2 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]"
osd3 lvm_volumes="[{'data': '/dev/sda'}, {'data': '/dev/sdb'}, {'data': '/dev/sdc'}]"
I see this in log:
TASK [ceph-config : Set_fact num_osds (add existing osds)] *********************
task path: /home/guillaume/workspaces/ceph-ansible/7502/roles/ceph-config/tasks/main.yml:93
Wednesday 20 March 2024 16:52:31 +0100 (0:00:00.641) 0:01:59.236 *******
ok: [osd0] => changed=false
ansible_facts:
num_osds: '2'
ok: [osd1] => changed=false
ansible_facts:
num_osds: '2'
ok: [osd2] => changed=false
ansible_facts:
num_osds: '2'
ok: [osd3] => changed=false
ansible_facts:
num_osds: '3'
is there anything wrong here?
@janhorstmann I might be missing something but with the current implementation :
inventory
[osds] osd0 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]" osd1 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group'}]" dmcrypt=true osd2 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]" osd3 lvm_volumes="[{'data': '/dev/sda'}, {'data': '/dev/sdb'}, {'data': '/dev/sdc'}]"I see this in log:
TASK [ceph-config : Set_fact num_osds (add existing osds)] ********************* task path: /home/guillaume/workspaces/ceph-ansible/7502/roles/ceph-config/tasks/main.yml:93 Wednesday 20 March 2024 16:52:31 +0100 (0:00:00.641) 0:01:59.236 ******* ok: [osd0] => changed=false ansible_facts: num_osds: '2' ok: [osd1] => changed=false ansible_facts: num_osds: '2' ok: [osd2] => changed=false ansible_facts: num_osds: '2' ok: [osd3] => changed=false ansible_facts: num_osds: '3'is there anything wrong here?
Thanks for taking the time to look into this, @guits. The output shown is correct.
Is this by chance from a first run of ceph-ansible on a fresh install? In that case, at the time the ceph-config role is run, there won't be any OSDs provisioned. Thus the output of ceph-volume lvm list will be empty and num_osds is only counted from the devices defined in lvm_volumes.
On any subsequent run of ceph-ansible OSDs will have been created and shown by ceph-volume lvm list. Then the calculation in task Set_fact num_osds (add existing osds) will
- Sum up the
deviceslists from all OSDs:lvm_list.stdout | default('{}') | from_json | dict2items | map(attribute='value') | flatten | map(attribute='devices') | sum(start=[])From what I have seen thedevicesof OSDs in the output ofceph-volume lvm listare always the underlying disks and never any logical volumes.
At this point the resulting list will contain devices fromblock,db, andwaltypes, thus counting more OSDs than actually exist ifdborwaltypes are listed - Create an iterable of the
datavalues inlvm_volumesfor thedifferencefilterlvm_volumes | default([]) | map(attribute='data')This iterable now contains devices and logical volumes - Apply the
differencefilter to both items:[...] | difference([...])Counterintuitively thedifferencefilter will return a list of unique items, thus ignoring multiple OSDs provisioned on the same device. It will also contain those OSDs provisioned from logical volumes inlvm_volumesas the list only contains disk devices and the difference is taken from a list containing the logical volume devices - Count the items in the resulting list and add it to the existing value in
num_osdsnum_osds: "{{ num_osds | int + ([...] | difference([...]) | length | int) }}"The existing value already contains a count of all items inlvm_volumesThis will the correct value on the first run as it will only contain the count oflvm_volumes. On subsequent runs this number will be different according to the combination ofnum_osds_per_device,dbdevices, etc.
Is this by chance from a first run of
ceph-ansibleon a fresh install?
no, this is from a re-run after a fresh install.
Is this by chance from a first run of
ceph-ansibleon a fresh install?no, this is from a re-run after a fresh install.
Before I dive deeper into this could you please confirm that the output is actually from the current implementation. I noticed the number 7502 in the task path, which is the exact number of this PR
TASK [ceph-config : Set_fact num_osds (add existing osds)] ********************* task path: /home/guillaume/workspaces/ceph-ansible/7502/roles/ceph-config/tasks/main.yml:93 [...]
Could this be a run with the version containing the fix? In that case I would hope that it is correct ;)
If that number is unrelated could you show the output of ceph-volume lvm list --format json of an OSD node? Maybe that could help to pinpoint the flaw in my logic.
Is this by chance from a first run of
ceph-ansibleon a fresh install?no, this is from a re-run after a fresh install.
Before I dive deeper into this could you please confirm that the output is actually from the current implementation. I noticed the number
7502in the task path, which is the exact number of this PR
I cloned the repo at a new path and named it with the id of you PR but it was well with the branch main
I'm gonna do more tests and double-check I didn't miss a detail
I cloned the repo at a new path and named it with the id of you PR but it was well with the branch
main
I'm gonna do more tests and double-check I didn't miss a detail
Thank you for bearing with me here.
I did not exactly reproduce your test environment, but set up a single instance with four volumes on four volume groups on four devices
pvs && vgs && lvs
PV VG Fmt Attr PSize PFree
/dev/sdb vg_b lvm2 a-- <10.00g 0
/dev/sdc vg_c lvm2 a-- <10.00g 0
/dev/sdd vg_d lvm2 a-- <10.00g 0
/dev/sde vg_e lvm2 a-- <10.00g 0
VG #PV #LV #SN Attr VSize VFree
vg_b 1 1 0 wz--n- <10.00g 0
vg_c 1 1 0 wz--n- <10.00g 0
vg_d 1 1 0 wz--n- <10.00g 0
vg_e 1 1 0 wz--n- <10.00g 0
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_b vg_b -wi-ao---- <10.00g
lv_c vg_c -wi-ao---- <10.00g
lv_d vg_d -wi-ao---- <10.00g
lv_e vg_e -wi-ao---- <10.00g
matching this config:
---
lvm_volumes:
- data: lv_b
data_vg: vg_b
- data: lv_c
data_vg: vg_c
- data: lv_d
data_vg: vg_d
db: lv_e
db_vg: vg_e
Using branch main I get the following diff between the first and second run in the output of the relevant parts of ceph-config:
TASK [ceph-config : Reset num_osds] ************************* TASK [ceph-config : Reset num_osds] *************************
ok: [localhost] => changed=false ok: [localhost] => changed=false
ansible_facts: ansible_facts:
num_osds: 0 num_osds: 0
TASK [ceph-config : Count number of osds for lvm scenario] ** TASK [ceph-config : Count number of osds for lvm scenario] **
ok: [localhost] => changed=false ok: [localhost] => changed=false
ansible_facts: ansible_facts:
num_osds: '3' num_osds: '3'
TASK [ceph-config : Look up for ceph-volume rejected devices] TASK [ceph-config : Look up for ceph-volume rejected devices]
skipping: [localhost] => changed=false skipping: [localhost] => changed=false
false_condition: devices | default([]) | length > 0 false_condition: devices | default([]) | length > 0
skip_reason: Conditional result was False skip_reason: Conditional result was False
TASK [ceph-config : Set_fact rejected_devices] ************** TASK [ceph-config : Set_fact rejected_devices] **************
skipping: [localhost] => changed=false skipping: [localhost] => changed=false
skipped_reason: No items in the list skipped_reason: No items in the list
TASK [ceph-config : Set_fact _devices] ********************** TASK [ceph-config : Set_fact _devices] **********************
skipping: [localhost] => changed=false skipping: [localhost] => changed=false
false_condition: devices | default([]) | length > 0 false_condition: devices | default([]) | length > 0
skip_reason: Conditional result was False skip_reason: Conditional result was False
TASK [ceph-config : Run 'ceph-volume lvm batch --report' to s TASK [ceph-config : Run 'ceph-volume lvm batch --report' to s
skipping: [localhost] => changed=false skipping: [localhost] => changed=false
false_condition: devices | default([]) | length > 0 false_condition: devices | default([]) | length > 0
skip_reason: Conditional result was False skip_reason: Conditional result was False
TASK [ceph-config : Set_fact num_osds from the output of 'cep TASK [ceph-config : Set_fact num_osds from the output of 'cep
skipping: [localhost] => changed=false skipping: [localhost] => changed=false
false_condition: devices | default([]) | length > 0 false_condition: devices | default([]) | length > 0
skip_reason: Conditional result was False skip_reason: Conditional result was False
TASK [ceph-config : Set_fact num_osds from the output of 'cep TASK [ceph-config : Set_fact num_osds from the output of 'cep
skipping: [localhost] => changed=false skipping: [localhost] => changed=false
false_condition: devices | default([]) | length > 0 false_condition: devices | default([]) | length > 0
skip_reason: Conditional result was False skip_reason: Conditional result was False
TASK [ceph-config : Run 'ceph-volume lvm list' to see how man TASK [ceph-config : Run 'ceph-volume lvm list' to see how man
ok: [localhost] => changed=false ok: [localhost] => changed=false
cmd: cmd:
- docker - docker
- run - run
- --rm - --rm
- --privileged - --privileged
- --net=host - --net=host
- --ipc=host - --ipc=host
- -v - -v
- /run/lock/lvm:/run/lock/lvm:z - /run/lock/lvm:/run/lock/lvm:z
- -v - -v
- /var/run/udev:/var/run/udev:z - /var/run/udev:/var/run/udev:z
- -v - -v
- /dev:/dev - /dev:/dev
- -v - -v
- /etc/ceph:/etc/ceph:z - /etc/ceph:/etc/ceph:z
- -v - -v
- /run/lvm:/run/lvm - /run/lvm:/run/lvm
- -v - -v
- /var/lib/ceph:/var/lib/ceph:ro - /var/lib/ceph:/var/lib/ceph:ro
- -v - -v
- /var/log/ceph:/var/log/ceph:z - /var/log/ceph:/var/log/ceph:z
- --entrypoint=ceph-volume - --entrypoint=ceph-volume
- quay.io/ceph/daemon-base:latest-main - quay.io/ceph/daemon-base:latest-main
- --cluster - --cluster
- ceph - ceph
- lvm - lvm
- list - list
- --format=json - --format=json
delta: '0:00:00.356457' | delta: '0:00:00.350386'
end: '2024-03-25 09:43:04.401330' | end: '2024-03-25 09:45:04.428959'
rc: 0 rc: 0
start: '2024-03-25 09:43:04.044873' | start: '2024-03-25 09:45:04.078573'
stderr: '' stderr: ''
stderr_lines: <omitted> stderr_lines: <omitted>
stdout: '{}' | stdout: |-
> {
> "0": [
> {
> "devices": [
> "/dev/sdb"
> ],
> "lv_name": "lv_b",
> "lv_path": "/dev/vg_b/lv_b",
> "lv_size": "10733223936",
> "lv_tags": "ceph.block_device=/dev/vg_b/lv_b,
> "lv_uuid": "E5KteH-nE2B-6n3p-jVzj-BHjN-kfON-6
> "name": "lv_b",
> "path": "/dev/vg_b/lv_b",
> "tags": {
> "ceph.block_device": "/dev/vg_b/lv_b",
> "ceph.block_uuid": "E5KteH-nE2B-6n3p-jVzj
> "ceph.cephx_lockbox_secret": "",
> "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
> "ceph.cluster_name": "ceph",
> "ceph.crush_device_class": "",
> "ceph.encrypted": "0",
> "ceph.osd_fsid": "57d97201-db17-4927-839a
> "ceph.osd_id": "0",
> "ceph.osdspec_affinity": "",
> "ceph.type": "block",
> "ceph.vdo": "0"
> },
> "type": "block",
> "vg_name": "vg_b"
> }
> ],
> "1": [
> {
> "devices": [
> "/dev/sdc"
> ],
> "lv_name": "lv_c",
> "lv_path": "/dev/vg_c/lv_c",
> "lv_size": "10733223936",
> "lv_tags": "ceph.block_device=/dev/vg_c/lv_c,
> "lv_uuid": "63g2QD-3l00-3mIt-YcoL-yfUs-GPPD-L
> "name": "lv_c",
> "path": "/dev/vg_c/lv_c",
> "tags": {
> "ceph.block_device": "/dev/vg_c/lv_c",
> "ceph.block_uuid": "63g2QD-3l00-3mIt-YcoL
> "ceph.cephx_lockbox_secret": "",
> "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
> "ceph.cluster_name": "ceph",
> "ceph.crush_device_class": "",
> "ceph.encrypted": "0",
> "ceph.osd_fsid": "5deb4190-7b0d-4170-adcf
> "ceph.osd_id": "1",
> "ceph.osdspec_affinity": "",
> "ceph.type": "block",
> "ceph.vdo": "0"
> },
> "type": "block",
> "vg_name": "vg_c"
> }
> ],
> "2": [
> {
> "devices": [
> "/dev/sdd"
> ],
> "lv_name": "lv_d",
> "lv_path": "/dev/vg_d/lv_d",
> "lv_size": "10733223936",
> "lv_tags": "ceph.block_device=/dev/vg_d/lv_d,
> "lv_uuid": "Rwef6N-ETHv-4TUd-9j2B-0N31-EAtp-c
> "name": "lv_d",
> "path": "/dev/vg_d/lv_d",
> "tags": {
> "ceph.block_device": "/dev/vg_d/lv_d",
> "ceph.block_uuid": "Rwef6N-ETHv-4TUd-9j2B
> "ceph.cephx_lockbox_secret": "",
> "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
> "ceph.cluster_name": "ceph",
> "ceph.crush_device_class": "",
> "ceph.db_device": "/dev/vg_e/lv_e",
> "ceph.db_uuid": "mANY8b-MvVI-VaU9-3afv-N7
> "ceph.encrypted": "0",
> "ceph.osd_fsid": "9a470d84-f91a-4e59-b963
> "ceph.osd_id": "2",
> "ceph.osdspec_affinity": "",
> "ceph.type": "block",
> "ceph.vdo": "0"
> },
> "type": "block",
> "vg_name": "vg_d"
> },
> {
> "devices": [
> "/dev/sde"
> ],
> "lv_name": "lv_e",
> "lv_path": "/dev/vg_e/lv_e",
> "lv_size": "10733223936",
> "lv_tags": "ceph.block_device=/dev/vg_d/lv_d,
> "lv_uuid": "mANY8b-MvVI-VaU9-3afv-N7Mw-KWED-m
> "name": "lv_e",
> "path": "/dev/vg_e/lv_e",
> "tags": {
> "ceph.block_device": "/dev/vg_d/lv_d",
> "ceph.block_uuid": "Rwef6N-ETHv-4TUd-9j2B
> "ceph.cephx_lockbox_secret": "",
> "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
> "ceph.cluster_name": "ceph",
> "ceph.crush_device_class": "",
> "ceph.db_device": "/dev/vg_e/lv_e",
> "ceph.db_uuid": "mANY8b-MvVI-VaU9-3afv-N7
> "ceph.encrypted": "0",
> "ceph.osd_fsid": "9a470d84-f91a-4e59-b963
> "ceph.osd_id": "2",
> "ceph.osdspec_affinity": "",
> "ceph.type": "db",
> "ceph.vdo": "0"
> },
> "type": "db",
> "vg_name": "vg_e"
> }
> ]
> }
stdout_lines: <omitted> stdout_lines: <omitted>
TASK [ceph-config : Set_fact num_osds (add existing osds)] ** TASK [ceph-config : Set_fact num_osds (add existing osds)] **
ok: [localhost] => changed=false ok: [localhost] => changed=false
ansible_facts: ansible_facts:
num_osds: '3' | num_osds: '7'
So on the second run, additionally to the count of items in lvm_volumes we have a count of all items in the output of ceph-volume lvm list --format json, thus the value for osd_memory_target is not calculated for the correct number of provisioned OSDs. In this case it gets reduced, so that resources are not used efficiently.
If we start to bring osds_per_device > 1 into the equation then memory might get overcommited resulting in OOM situations.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
I am still interested in landing this. Let me know if there is anything I can do to move this along
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
@guits did you have time to look into this yet?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
This pull request has been automatically closed due to inactivity. Please re-open if these changes are still required.