ceph-ansible icon indicating copy to clipboard operation
ceph-ansible copied to clipboard

ceph-config: fix calculation of `num_osds`

Open janhorstmann opened this issue 1 year ago • 14 comments

The number of OSDs defined by the lvm_volumes variable is added to num_osds in task Count number of osds for lvm scenario. Therefore theses devices must not be counted in task Set_fact num_osds (add existing osds). There are currently three problems with the existing approach:

  1. Bluestore DB and WAL devices are counted as OSDs
  2. lvm_volumes supports a second notation to directly specify logical volumes instead of devices when the data_vg key exists. This scenario is not yet accounted for.
  3. The difference filter used to remove devices from lvm_volumes returns a list of unique elements, thus not accounting for multiple OSDs on a single device

The first problem is solved by filtering the list of logical volumes for devices used as type block. For the second and third problem lists are created from lvm_volumes containing either paths to devices or logical volumes devices. For the second problem the output of ceph-volume is simply filtered for lv_paths appearing in the list of logical volume devices described above. To solve the third problem the remaining OSDs in the output are compiled into a list of their used devices, which is then filtered for devices appearing in the list of devices from lvm_volumes.

Fixes: https://github.com/ceph/ceph-ansible/issues/7435

janhorstmann avatar Mar 15 '24 10:03 janhorstmann

jenkins test centos-container-external_clients

clwluvw avatar Mar 16 '24 21:03 clwluvw

jenkins test centos-container-rbdmirror

clwluvw avatar Mar 16 '24 21:03 clwluvw

@clwluvw that's probably the kind of task which deserves to be converted into a module

guits avatar Mar 20 '24 09:03 guits

@janhorstmann I might be missing something but with the current implementation :

inventory

[osds]
osd0 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]"
osd1 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group'}]" dmcrypt=true
osd2 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]"
osd3 lvm_volumes="[{'data': '/dev/sda'}, {'data': '/dev/sdb'}, {'data': '/dev/sdc'}]"

I see this in log:

TASK [ceph-config : Set_fact num_osds (add existing osds)] *********************
task path: /home/guillaume/workspaces/ceph-ansible/7502/roles/ceph-config/tasks/main.yml:93
Wednesday 20 March 2024  16:52:31 +0100 (0:00:00.641)       0:01:59.236 *******
ok: [osd0] => changed=false
  ansible_facts:
    num_osds: '2'
ok: [osd1] => changed=false
  ansible_facts:
    num_osds: '2'
ok: [osd2] => changed=false
  ansible_facts:
    num_osds: '2'
ok: [osd3] => changed=false
  ansible_facts:
    num_osds: '3'

is there anything wrong here?

guits avatar Mar 20 '24 16:03 guits

@janhorstmann I might be missing something but with the current implementation :

inventory

[osds]
osd0 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]"
osd1 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group'}]" dmcrypt=true
osd2 lvm_volumes="[{'data': 'data-lv1', 'data_vg': 'test_group'},{'data': 'data-lv2', 'data_vg': 'test_group', 'db': 'journal1', 'db_vg': 'journals'}]"
osd3 lvm_volumes="[{'data': '/dev/sda'}, {'data': '/dev/sdb'}, {'data': '/dev/sdc'}]"

I see this in log:

TASK [ceph-config : Set_fact num_osds (add existing osds)] *********************
task path: /home/guillaume/workspaces/ceph-ansible/7502/roles/ceph-config/tasks/main.yml:93
Wednesday 20 March 2024  16:52:31 +0100 (0:00:00.641)       0:01:59.236 *******
ok: [osd0] => changed=false
  ansible_facts:
    num_osds: '2'
ok: [osd1] => changed=false
  ansible_facts:
    num_osds: '2'
ok: [osd2] => changed=false
  ansible_facts:
    num_osds: '2'
ok: [osd3] => changed=false
  ansible_facts:
    num_osds: '3'

is there anything wrong here?

Thanks for taking the time to look into this, @guits. The output shown is correct.

Is this by chance from a first run of ceph-ansible on a fresh install? In that case, at the time the ceph-config role is run, there won't be any OSDs provisioned. Thus the output of ceph-volume lvm list will be empty and num_osds is only counted from the devices defined in lvm_volumes. On any subsequent run of ceph-ansible OSDs will have been created and shown by ceph-volume lvm list. Then the calculation in task Set_fact num_osds (add existing osds) will

  • Sum up the devices lists from all OSDs: lvm_list.stdout | default('{}') | from_json | dict2items | map(attribute='value') | flatten | map(attribute='devices') | sum(start=[]) From what I have seen the devices of OSDs in the output of ceph-volume lvm list are always the underlying disks and never any logical volumes.
    At this point the resulting list will contain devices from block, db, and wal types, thus counting more OSDs than actually exist if db or wal types are listed
  • Create an iterable of the data values in lvm_volumes for the difference filter lvm_volumes | default([]) | map(attribute='data') This iterable now contains devices and logical volumes
  • Apply the difference filter to both items: [...] | difference([...]) Counterintuitively the difference filter will return a list of unique items, thus ignoring multiple OSDs provisioned on the same device. It will also contain those OSDs provisioned from logical volumes in lvm_volumes as the list only contains disk devices and the difference is taken from a list containing the logical volume devices
  • Count the items in the resulting list and add it to the existing value in num_osds num_osds: "{{ num_osds | int + ([...] | difference([...]) | length | int) }}" The existing value already contains a count of all items in lvm_volumes This will the correct value on the first run as it will only contain the count of lvm_volumes. On subsequent runs this number will be different according to the combination of num_osds_per_device, db devices, etc.

janhorstmann avatar Mar 21 '24 10:03 janhorstmann

Is this by chance from a first run of ceph-ansible on a fresh install?

no, this is from a re-run after a fresh install.

guits avatar Mar 21 '24 12:03 guits

Is this by chance from a first run of ceph-ansible on a fresh install?

no, this is from a re-run after a fresh install.

Before I dive deeper into this could you please confirm that the output is actually from the current implementation. I noticed the number 7502 in the task path, which is the exact number of this PR

TASK [ceph-config : Set_fact num_osds (add existing osds)] *********************
task path: /home/guillaume/workspaces/ceph-ansible/7502/roles/ceph-config/tasks/main.yml:93
[...]

Could this be a run with the version containing the fix? In that case I would hope that it is correct ;)

If that number is unrelated could you show the output of ceph-volume lvm list --format json of an OSD node? Maybe that could help to pinpoint the flaw in my logic.

janhorstmann avatar Mar 21 '24 14:03 janhorstmann

Is this by chance from a first run of ceph-ansible on a fresh install?

no, this is from a re-run after a fresh install.

Before I dive deeper into this could you please confirm that the output is actually from the current implementation. I noticed the number 7502 in the task path, which is the exact number of this PR

I cloned the repo at a new path and named it with the id of you PR but it was well with the branch main

guits avatar Mar 21 '24 15:03 guits

I'm gonna do more tests and double-check I didn't miss a detail

guits avatar Mar 21 '24 15:03 guits

I cloned the repo at a new path and named it with the id of you PR but it was well with the branch main

I'm gonna do more tests and double-check I didn't miss a detail

Thank you for bearing with me here.

I did not exactly reproduce your test environment, but set up a single instance with four volumes on four volume groups on four devices

pvs && vgs && lvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sdb   vg_b lvm2 a--  <10.00g    0
  /dev/sdc   vg_c lvm2 a--  <10.00g    0
  /dev/sdd   vg_d lvm2 a--  <10.00g    0
  /dev/sde   vg_e lvm2 a--  <10.00g    0
  VG   #PV #LV #SN Attr   VSize   VFree
  vg_b   1   1   0 wz--n- <10.00g    0
  vg_c   1   1   0 wz--n- <10.00g    0
  vg_d   1   1   0 wz--n- <10.00g    0
  vg_e   1   1   0 wz--n- <10.00g    0
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_b vg_b -wi-ao---- <10.00g
  lv_c vg_c -wi-ao---- <10.00g
  lv_d vg_d -wi-ao---- <10.00g
  lv_e vg_e -wi-ao---- <10.00g

matching this config:

---
lvm_volumes:
  - data: lv_b
    data_vg: vg_b
  - data: lv_c
    data_vg: vg_c
  - data: lv_d
    data_vg: vg_d
    db: lv_e
    db_vg: vg_e

Using branch main I get the following diff between the first and second run in the output of the relevant parts of ceph-config:

TASK [ceph-config : Reset num_osds] *************************	TASK [ceph-config : Reset num_osds] *************************
ok: [localhost] => changed=false				ok: [localhost] => changed=false
  ansible_facts:						  ansible_facts:
    num_osds: 0							    num_osds: 0

TASK [ceph-config : Count number of osds for lvm scenario] **	TASK [ceph-config : Count number of osds for lvm scenario] **
ok: [localhost] => changed=false				ok: [localhost] => changed=false
  ansible_facts:						  ansible_facts:
    num_osds: '3'						    num_osds: '3'

TASK [ceph-config : Look up for ceph-volume rejected devices]	TASK [ceph-config : Look up for ceph-volume rejected devices]
skipping: [localhost] => changed=false				skipping: [localhost] => changed=false
  false_condition: devices | default([]) | length > 0		  false_condition: devices | default([]) | length > 0
  skip_reason: Conditional result was False			  skip_reason: Conditional result was False

TASK [ceph-config : Set_fact rejected_devices] **************	TASK [ceph-config : Set_fact rejected_devices] **************
skipping: [localhost] => changed=false				skipping: [localhost] => changed=false
  skipped_reason: No items in the list				  skipped_reason: No items in the list

TASK [ceph-config : Set_fact _devices] **********************	TASK [ceph-config : Set_fact _devices] **********************
skipping: [localhost] => changed=false				skipping: [localhost] => changed=false
  false_condition: devices | default([]) | length > 0		  false_condition: devices | default([]) | length > 0
  skip_reason: Conditional result was False			  skip_reason: Conditional result was False

TASK [ceph-config : Run 'ceph-volume lvm batch --report' to s	TASK [ceph-config : Run 'ceph-volume lvm batch --report' to s
skipping: [localhost] => changed=false				skipping: [localhost] => changed=false
  false_condition: devices | default([]) | length > 0		  false_condition: devices | default([]) | length > 0
  skip_reason: Conditional result was False			  skip_reason: Conditional result was False

TASK [ceph-config : Set_fact num_osds from the output of 'cep	TASK [ceph-config : Set_fact num_osds from the output of 'cep
skipping: [localhost] => changed=false				skipping: [localhost] => changed=false
  false_condition: devices | default([]) | length > 0		  false_condition: devices | default([]) | length > 0
  skip_reason: Conditional result was False			  skip_reason: Conditional result was False

TASK [ceph-config : Set_fact num_osds from the output of 'cep	TASK [ceph-config : Set_fact num_osds from the output of 'cep
skipping: [localhost] => changed=false				skipping: [localhost] => changed=false
  false_condition: devices | default([]) | length > 0		  false_condition: devices | default([]) | length > 0
  skip_reason: Conditional result was False			  skip_reason: Conditional result was False

TASK [ceph-config : Run 'ceph-volume lvm list' to see how man	TASK [ceph-config : Run 'ceph-volume lvm list' to see how man
ok: [localhost] => changed=false				ok: [localhost] => changed=false
  cmd:								  cmd:
  - docker							  - docker
  - run								  - run
  - --rm							  - --rm
  - --privileged						  - --privileged
  - --net=host							  - --net=host
  - --ipc=host							  - --ipc=host
  - -v								  - -v
  - /run/lock/lvm:/run/lock/lvm:z				  - /run/lock/lvm:/run/lock/lvm:z
  - -v								  - -v
  - /var/run/udev:/var/run/udev:z				  - /var/run/udev:/var/run/udev:z
  - -v								  - -v
  - /dev:/dev							  - /dev:/dev
  - -v								  - -v
  - /etc/ceph:/etc/ceph:z					  - /etc/ceph:/etc/ceph:z
  - -v								  - -v
  - /run/lvm:/run/lvm						  - /run/lvm:/run/lvm
  - -v								  - -v
  - /var/lib/ceph:/var/lib/ceph:ro				  - /var/lib/ceph:/var/lib/ceph:ro
  - -v								  - -v
  - /var/log/ceph:/var/log/ceph:z				  - /var/log/ceph:/var/log/ceph:z
  - --entrypoint=ceph-volume					  - --entrypoint=ceph-volume
  - quay.io/ceph/daemon-base:latest-main			  - quay.io/ceph/daemon-base:latest-main
  - --cluster							  - --cluster
  - ceph							  - ceph
  - lvm								  - lvm
  - list							  - list
  - --format=json						  - --format=json
  delta: '0:00:00.356457'				      |	  delta: '0:00:00.350386'
  end: '2024-03-25 09:43:04.401330'			      |	  end: '2024-03-25 09:45:04.428959'
  rc: 0								  rc: 0
  start: '2024-03-25 09:43:04.044873'			      |	  start: '2024-03-25 09:45:04.078573'
  stderr: ''							  stderr: ''
  stderr_lines: <omitted>					  stderr_lines: <omitted>
  stdout: '{}'						      |	  stdout: |-
							      >	    {
							      >	        "0": [
							      >	            {
							      >	                "devices": [
							      >	                    "/dev/sdb"
							      >	                ],
							      >	                "lv_name": "lv_b",
							      >	                "lv_path": "/dev/vg_b/lv_b",
							      >	                "lv_size": "10733223936",
							      >	                "lv_tags": "ceph.block_device=/dev/vg_b/lv_b,
							      >	                "lv_uuid": "E5KteH-nE2B-6n3p-jVzj-BHjN-kfON-6
							      >	                "name": "lv_b",
							      >	                "path": "/dev/vg_b/lv_b",
							      >	                "tags": {
							      >	                    "ceph.block_device": "/dev/vg_b/lv_b",
							      >	                    "ceph.block_uuid": "E5KteH-nE2B-6n3p-jVzj
							      >	                    "ceph.cephx_lockbox_secret": "",
							      >	                    "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
							      >	                    "ceph.cluster_name": "ceph",
							      >	                    "ceph.crush_device_class": "",
							      >	                    "ceph.encrypted": "0",
							      >	                    "ceph.osd_fsid": "57d97201-db17-4927-839a
							      >	                    "ceph.osd_id": "0",
							      >	                    "ceph.osdspec_affinity": "",
							      >	                    "ceph.type": "block",
							      >	                    "ceph.vdo": "0"
							      >	                },
							      >	                "type": "block",
							      >	                "vg_name": "vg_b"
							      >	            }
							      >	        ],
							      >	        "1": [
							      >	            {
							      >	                "devices": [
							      >	                    "/dev/sdc"
							      >	                ],
							      >	                "lv_name": "lv_c",
							      >	                "lv_path": "/dev/vg_c/lv_c",
							      >	                "lv_size": "10733223936",
							      >	                "lv_tags": "ceph.block_device=/dev/vg_c/lv_c,
							      >	                "lv_uuid": "63g2QD-3l00-3mIt-YcoL-yfUs-GPPD-L
							      >	                "name": "lv_c",
							      >	                "path": "/dev/vg_c/lv_c",
							      >	                "tags": {
							      >	                    "ceph.block_device": "/dev/vg_c/lv_c",
							      >	                    "ceph.block_uuid": "63g2QD-3l00-3mIt-YcoL
							      >	                    "ceph.cephx_lockbox_secret": "",
							      >	                    "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
							      >	                    "ceph.cluster_name": "ceph",
							      >	                    "ceph.crush_device_class": "",
							      >	                    "ceph.encrypted": "0",
							      >	                    "ceph.osd_fsid": "5deb4190-7b0d-4170-adcf
							      >	                    "ceph.osd_id": "1",
							      >	                    "ceph.osdspec_affinity": "",
							      >	                    "ceph.type": "block",
							      >	                    "ceph.vdo": "0"
							      >	                },
							      >	                "type": "block",
							      >	                "vg_name": "vg_c"
							      >	            }
							      >	        ],
							      >	        "2": [
							      >	            {
							      >	                "devices": [
							      >	                    "/dev/sdd"
							      >	                ],
							      >	                "lv_name": "lv_d",
							      >	                "lv_path": "/dev/vg_d/lv_d",
							      >	                "lv_size": "10733223936",
							      >	                "lv_tags": "ceph.block_device=/dev/vg_d/lv_d,
							      >	                "lv_uuid": "Rwef6N-ETHv-4TUd-9j2B-0N31-EAtp-c
							      >	                "name": "lv_d",
							      >	                "path": "/dev/vg_d/lv_d",
							      >	                "tags": {
							      >	                    "ceph.block_device": "/dev/vg_d/lv_d",
							      >	                    "ceph.block_uuid": "Rwef6N-ETHv-4TUd-9j2B
							      >	                    "ceph.cephx_lockbox_secret": "",
							      >	                    "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
							      >	                    "ceph.cluster_name": "ceph",
							      >	                    "ceph.crush_device_class": "",
							      >	                    "ceph.db_device": "/dev/vg_e/lv_e",
							      >	                    "ceph.db_uuid": "mANY8b-MvVI-VaU9-3afv-N7
							      >	                    "ceph.encrypted": "0",
							      >	                    "ceph.osd_fsid": "9a470d84-f91a-4e59-b963
							      >	                    "ceph.osd_id": "2",
							      >	                    "ceph.osdspec_affinity": "",
							      >	                    "ceph.type": "block",
							      >	                    "ceph.vdo": "0"
							      >	                },
							      >	                "type": "block",
							      >	                "vg_name": "vg_d"
							      >	            },
							      >	            {
							      >	                "devices": [
							      >	                    "/dev/sde"
							      >	                ],
							      >	                "lv_name": "lv_e",
							      >	                "lv_path": "/dev/vg_e/lv_e",
							      >	                "lv_size": "10733223936",
							      >	                "lv_tags": "ceph.block_device=/dev/vg_d/lv_d,
							      >	                "lv_uuid": "mANY8b-MvVI-VaU9-3afv-N7Mw-KWED-m
							      >	                "name": "lv_e",
							      >	                "path": "/dev/vg_e/lv_e",
							      >	                "tags": {
							      >	                    "ceph.block_device": "/dev/vg_d/lv_d",
							      >	                    "ceph.block_uuid": "Rwef6N-ETHv-4TUd-9j2B
							      >	                    "ceph.cephx_lockbox_secret": "",
							      >	                    "ceph.cluster_fsid": "c29aec7d-cf6c-4cd4-
							      >	                    "ceph.cluster_name": "ceph",
							      >	                    "ceph.crush_device_class": "",
							      >	                    "ceph.db_device": "/dev/vg_e/lv_e",
							      >	                    "ceph.db_uuid": "mANY8b-MvVI-VaU9-3afv-N7
							      >	                    "ceph.encrypted": "0",
							      >	                    "ceph.osd_fsid": "9a470d84-f91a-4e59-b963
							      >	                    "ceph.osd_id": "2",
							      >	                    "ceph.osdspec_affinity": "",
							      >	                    "ceph.type": "db",
							      >	                    "ceph.vdo": "0"
							      >	                },
							      >	                "type": "db",
							      >	                "vg_name": "vg_e"
							      >	            }
							      >	        ]
							      >	    }
  stdout_lines: <omitted>					  stdout_lines: <omitted>

TASK [ceph-config : Set_fact num_osds (add existing osds)] **	TASK [ceph-config : Set_fact num_osds (add existing osds)] **
ok: [localhost] => changed=false				ok: [localhost] => changed=false
  ansible_facts:						  ansible_facts:
    num_osds: '3'					      |	    num_osds: '7'

So on the second run, additionally to the count of items in lvm_volumes we have a count of all items in the output of ceph-volume lvm list --format json, thus the value for osd_memory_target is not calculated for the correct number of provisioned OSDs. In this case it gets reduced, so that resources are not used efficiently.

If we start to bring osds_per_device > 1 into the equation then memory might get overcommited resulting in OOM situations.

janhorstmann avatar Mar 25 '24 11:03 janhorstmann

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 09 '24 20:04 github-actions[bot]

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

I am still interested in landing this. Let me know if there is anything I can do to move this along

janhorstmann avatar Apr 10 '24 13:04 janhorstmann

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 26 '24 20:04 github-actions[bot]

@guits did you have time to look into this yet?

janhorstmann avatar May 02 '24 06:05 janhorstmann

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 17 '24 20:05 github-actions[bot]

This pull request has been automatically closed due to inactivity. Please re-open if these changes are still required.

github-actions[bot] avatar Jun 01 '24 20:06 github-actions[bot]