DeepSea icon indicating copy to clipboard operation
DeepSea copied to clipboard

Custom OSD configs break Stage 3

Open smithfarm opened this issue 5 years ago • 8 comments

On DeepSea master branch, if I use the OSD configuration from qa/osd-config/ovh/bs_dedicated_db.yaml Stage 3 fails like this:

(steps 1-27 succeed)

2018-07-27T14:33:23.495 INFO:teuthology.orchestra.run.target217182143060.stdout:[28/29] Executing state ceph.processes.osd...
2018-07-27T14:33:23.495 INFO:teuthology.orchestra.run.target217182143060.stdout:         |_  target217182143060.teuthology: wait for osd processes(cephprocesses.wait) fail
2018-07-27T14:33:23.496 INFO:teuthology.orchestra.run.target217182143060.stdout:         target217182143060.teuthology... fail
2018-07-27T14:33:23.496 INFO:teuthology.orchestra.run.target217182143060.stdout:Finished stage ceph.stage.3: succeeded=27/29 failed=1/29
2018-07-27T14:33:23.496 INFO:teuthology.orchestra.run.target217182143060.stdout:
2018-07-27T14:33:23.496 INFO:teuthology.orchestra.run.target217182143060.stdout:Failures summary:
2018-07-27T14:33:23.496 INFO:teuthology.orchestra.run.target217182143060.stdout:ceph.processes.osd (/srv/salt/ceph/processes/osd):
2018-07-27T14:33:23.496 INFO:teuthology.orchestra.run.target217182143060.stdout:  target217182143060.teuthology:
2018-07-27T14:33:23.497 INFO:teuthology.orchestra.run.target217182143060.stdout:    wait for osd processes: Module function cephprocesses.wait executed
2018-07-27T14:33:23.588 INFO:teuthology.orchestra.run.target217182143060.stdout:deepsea exit status: 1

Note: this is a single-node cluster in OVH with root (/dev/vda) + 4 (/dev/vd[bcde]) disk drives.

Relevant excerpt from pillar data:

2018-07-27T14:27:08.593 INFO:teuthology.orchestra.run.target217182143060.stdout:    ceph:
2018-07-27T14:27:08.593 INFO:teuthology.orchestra.run.target217182143060.stdout:        ----------
2018-07-27T14:27:08.593 INFO:teuthology.orchestra.run.target217182143060.stdout:        storage:
2018-07-27T14:27:08.593 INFO:teuthology.orchestra.run.target217182143060.stdout:            ----------
2018-07-27T14:27:08.593 INFO:teuthology.orchestra.run.target217182143060.stdout:            osds:
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                ----------
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                /dev/vdb:
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                    ----------
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                    db:
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                        /dev/vde
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                    format:
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                        bluestore
2018-07-27T14:27:08.594 INFO:teuthology.orchestra.run.target217182143060.stdout:                /dev/vdc:
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                    ----------
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                    db:
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                        /dev/vde
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                    format:
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                        bluestore
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                /dev/vdd:
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                    ----------
2018-07-27T14:27:08.595 INFO:teuthology.orchestra.run.target217182143060.stdout:                    db:
2018-07-27T14:27:08.596 INFO:teuthology.orchestra.run.target217182143060.stdout:                        /dev/vde
2018-07-27T14:27:08.596 INFO:teuthology.orchestra.run.target217182143060.stdout:                    format:
2018-07-27T14:27:08.596 INFO:teuthology.orchestra.run.target217182143060.stdout:                        bluestore

Logs: http://137.74.25.20/ubuntu-2018-07-27_14:11:42-suse:smoke:health-ok-wip-qa-osd-config---basic-openstack/91/

smithfarm avatar Jul 27 '18 14:07 smithfarm

It appears to be reproducible. Here's the next run: http://137.74.25.20:8081/ubuntu-2018-07-27_14:21:18-suse:smoke:health-ok-wip-qa-osd-config---basic-openstack/

(the green one is using default profile)

smithfarm avatar Jul 27 '18 15:07 smithfarm

A relevant failure ( from the minion logs )

2018-07-27 14:30:10,067 [salt.loaded.ext.module.helper:64  ][DEBUG   ][41364] stderr of PYTHONWARNINGS=ignore ceph-disk -v prepare --bluestore --data-dev --journal-dev --cluster ceph --cluste
r-uuid 1efc1033-0565-4586-9f79-36906a01bbb0 --block.db /dev/vde /dev/vdd:

[-- snip -- ]

 'command: Running command: /usr/sbin/parted --machine -- /dev/vde print\n'
 'get_free_partition_index: get_free_partition_index: analyzing \r'
 '                                                                          \r'
 'BYT;\n'
 '/dev/vde:10.7GB:virtblk:512:512:unknown:Virtio Block Device:;\n'
 '\n'
 'Traceback (most recent call last):\n'
 '  File "/usr/sbin/ceph-disk", line 11, in <module>\n'
 "    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()\n"
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 5753, in '
 'run\n'
 '    main(sys.argv[1:])\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 5704, in '
 'main\n'
 '    args.func(args)\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 2121, in '
 'main\n'
 '    Prepare.factory(args).prepare()\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 2110, in '
 'prepare\n'
 '    self._prepare()\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 2191, in '
 '_prepare\n'
 '    self.data.prepare(*to_prepare_list)\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 2965, in '
 'prepare\n'
 '    self.prepare_device(*to_prepare_list)\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 3142, in '
 'prepare_device\n'
 '    to_prepare.prepare()\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 2332, in '
 'prepare\n'
 '    self.prepare_device()\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 2424, in '
 'prepare_device\n'
 '    num=num)\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 1778, in '
 'create_partition\n'
 '    num = get_free_partition_index(dev=self.path)\n'
 '  File "/usr/lib/python3.6/site-packages/ceph_disk/main.py", line 1535, in '
 'get_free_partition_index\n'
 '    return max(partition_numbers) + 1\n'
 'ValueError: max() arg is an empty sequence')

jschmid1 avatar Jul 27 '18 15:07 jschmid1

lsblk before Stage 3:

2018-07-27T15:30:14.593 INFO:teuthology.orchestra.run.target149202164067.stdout:    NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
2018-07-27T15:30:14.594 INFO:teuthology.orchestra.run.target149202164067.stdout:    vda    254:0    0  40G  0 disk
2018-07-27T15:30:14.594 INFO:teuthology.orchestra.run.target149202164067.stdout:    |-vda1 254:1    0   2M  0 part
2018-07-27T15:30:14.594 INFO:teuthology.orchestra.run.target149202164067.stdout:    |-vda2 254:2    0  20M  0 part /boot/efi
2018-07-27T15:30:14.594 INFO:teuthology.orchestra.run.target149202164067.stdout:    `-vda3 254:3    0  40G  0 part /
2018-07-27T15:30:14.595 INFO:teuthology.orchestra.run.target149202164067.stdout:    vdb    254:16   0  10G  0 disk
2018-07-27T15:30:14.595 INFO:teuthology.orchestra.run.target149202164067.stdout:    vdc    254:32   0  10G  0 disk
2018-07-27T15:30:14.595 INFO:teuthology.orchestra.run.target149202164067.stdout:    vdd    254:48   0  10G  0 disk
2018-07-27T15:30:14.595 INFO:teuthology.orchestra.run.target149202164067.stdout:    vde    254:64   0  10G  0 disk

and lsblk, ceph-disk list, ceph osd tree after Stage 3:

2018-07-27T16:57:01.599 INFO:teuthology.orchestra.run.target149202171021.stdout:NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
2018-07-27T16:57:01.599 INFO:teuthology.orchestra.run.target149202171021.stdout:vda    254:0    0   40G  0 disk
2018-07-27T16:57:01.599 INFO:teuthology.orchestra.run.target149202171021.stdout:├─vda1 254:1    0    2M  0 part
2018-07-27T16:57:01.599 INFO:teuthology.orchestra.run.target149202171021.stdout:├─vda2 254:2    0   20M  0 part /boot/efi
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:└─vda3 254:3    0   40G  0 part /
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:vdb    254:16   0   10G  0 disk
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:└─vdb1 254:17   0  100M  0 part
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:vdc    254:32   0   10G  0 disk
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:└─vdc1 254:33   0  100M  0 part
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:vdd    254:48   0   10G  0 disk
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:└─vdd1 254:49   0  100M  0 part
2018-07-27T16:57:01.600 INFO:teuthology.orchestra.run.target149202171021.stdout:vde    254:64   0   10G  0 disk
2018-07-27T16:57:01.601 INFO:teuthology.orchestra.run.target149202171021.stderr:+ set -e
2018-07-27T16:57:01.601 INFO:teuthology.orchestra.run.target149202171021.stderr:+ lsblk
2018-07-27T16:57:01.601 INFO:teuthology.orchestra.run.target149202171021.stderr:+ ceph-disk list
2018-07-27T16:57:02.685 INFO:teuthology.orchestra.run.target149202171021.stdout:/dev/vda :
2018-07-27T16:57:02.685 INFO:teuthology.orchestra.run.target149202171021.stdout: /dev/vda1 other, 21686148-6449-6e6f-744e-656564454649
2018-07-27T16:57:02.685 INFO:teuthology.orchestra.run.target149202171021.stdout: /dev/vda2 other, vfat
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout: /dev/vda3 other, xfs
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout:/dev/vdb :
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout: /dev/vdb1 other, 89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout:/dev/vdc :
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout: /dev/vdc1 other, 89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout:/dev/vdd :
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout: /dev/vdd1 other, 89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be
2018-07-27T16:57:02.686 INFO:teuthology.orchestra.run.target149202171021.stdout:/dev/vde other, unknown
2018-07-27T16:57:02.687 INFO:teuthology.orchestra.run.target149202171021.stderr:/usr/lib/python3.6/site-packages/ceph_disk/main.py:5684: UserWarning:
2018-07-27T16:57:02.687 INFO:teuthology.orchestra.run.target149202171021.stderr:ceph-disk is deprecated and will be removed in the next major version
2018-07-27T16:57:02.687 INFO:teuthology.orchestra.run.target149202171021.stderr:of Ceph (Nautilus). The replacement tool is called ceph-volume. For
2018-07-27T16:57:02.687 INFO:teuthology.orchestra.run.target149202171021.stderr:details see http://docs.ceph.com/docs/master/ceph-volume/#migrating
2018-07-27T16:57:02.687 INFO:teuthology.orchestra.run.target149202171021.stderr:
2018-07-27T16:57:02.687 INFO:teuthology.orchestra.run.target149202171021.stderr:  warnings.warn(DEPRECATION_WARNING)
2018-07-27T16:57:02.722 INFO:teuthology.orchestra.run.target149202171021.stderr:+ ceph osd tree
2018-07-27T16:57:03.084 INFO:teuthology.orchestra.run.target149202171021.stdout:ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
2018-07-27T16:57:03.084 INFO:teuthology.orchestra.run.target149202171021.stdout:-1            0 root default

smithfarm avatar Jul 27 '18 15:07 smithfarm

Hrm, bs_dedicated_wal_crypt.yaml fails in the same way.

after Stage 1:

2018-07-27T18:18:46.672 INFO:teuthology.orchestra.run.target149202181003.stdout:Your randomly chosen storage profile bs_dedicated_wal_crypt.yaml has the following contents:
2018-07-27T18:18:46.672 INFO:teuthology.orchestra.run.target149202181003.stderr:+ echo 'Your randomly chosen storage profile bs_dedicated_wal_crypt.yaml has the following contents:'
2018-07-27T18:18:46.672 INFO:teuthology.orchestra.run.target149202181003.stderr:+ cat /srv/pillar/ceph/proposals/profile-random/stack/default/ceph/minions/target149202181003.teuthology.yml
2018-07-27T18:18:46.673 INFO:teuthology.orchestra.run.target149202181003.stdout:ceph:
2018-07-27T18:18:46.673 INFO:teuthology.orchestra.run.target149202181003.stdout:  storage:
2018-07-27T18:18:46.673 INFO:teuthology.orchestra.run.target149202181003.stdout:    osds:
2018-07-27T18:18:46.673 INFO:teuthology.orchestra.run.target149202181003.stdout:      /dev/vdb:
2018-07-27T18:18:46.673 INFO:teuthology.orchestra.run.target149202181003.stdout:        format: bluestore
2018-07-27T18:18:46.673 INFO:teuthology.orchestra.run.target149202181003.stdout:        encryption: dmcrypt
2018-07-27T18:18:46.673 INFO:teuthology.orchestra.run.target149202181003.stdout:        wal: /dev/vde
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:      /dev/vdc:
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:        format: bluestore
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:        encryption: dmcrypt
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:        wal: /dev/vde
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:      /dev/vdd:
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:        format: bluestore
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:        encryption: dmcrypt
2018-07-27T18:18:46.674 INFO:teuthology.orchestra.run.target149202181003.stdout:        wal: /dev/vde

after Stage 3:

2018-07-27T18:27:52.638 INFO:teuthology.orchestra.run.target149202181003.stdout:NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:vda    254:0    0  40G  0 disk
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:├─vda1 254:1    0   2M  0 part
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:├─vda2 254:2    0  20M  0 part /boot/efi
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:└─vda3 254:3    0  40G  0 part /
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:vdb    254:16   0  10G  0 disk
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:└─vdb5 254:21   0  10M  0 part /var/lib/ceph/osd-lockbox/48acfb33-4116-43ad-8d96-0cb101c477ad
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:vdc    254:32   0  10G  0 disk
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:└─vdc5 254:37   0  10M  0 part /var/lib/ceph/osd-lockbox/28bedee2-e765-407e-a250-fa93563f776b
2018-07-27T18:27:52.639 INFO:teuthology.orchestra.run.target149202181003.stdout:vdd    254:48   0  10G  0 disk
2018-07-27T18:27:52.640 INFO:teuthology.orchestra.run.target149202181003.stdout:└─vdd5 254:53   0  10M  0 part /var/lib/ceph/osd-lockbox/9ec2c4a3-c06b-46ed-b3d0-0043fcb7c9be
2018-07-27T18:27:52.640 INFO:teuthology.orchestra.run.target149202181003.stdout:vde    254:64   0  10G  0 disk
2018-07-27T18:27:52.640 INFO:teuthology.orchestra.run.target149202181003.stderr:+ set -e
2018-07-27T18:27:52.640 INFO:teuthology.orchestra.run.target149202181003.stderr:+ lsblk
2018-07-27T18:27:52.640 INFO:teuthology.orchestra.run.target149202181003.stderr:+ ceph-disk list
2018-07-27T18:27:53.034 INFO:teuthology.orchestra.run.target149202181003.stdout:/dev/vda :
2018-07-27T18:27:53.035 INFO:teuthology.orchestra.run.target149202181003.stdout: /dev/vda1 other, 21686148-6449-6e6f-744e-656564454649
2018-07-27T18:27:53.035 INFO:teuthology.orchestra.run.target149202181003.stdout: /dev/vda2 other, vfat
2018-07-27T18:27:53.035 INFO:teuthology.orchestra.run.target149202181003.stdout: /dev/vda3 other, xfs
2018-07-27T18:27:53.036 INFO:teuthology.orchestra.run.target149202181003.stdout:/dev/vdb :
2018-07-27T18:27:53.036 INFO:teuthology.orchestra.run.target149202181003.stdout: /dev/vdb5 other, ext4
2018-07-27T18:27:53.036 INFO:teuthology.orchestra.run.target149202181003.stdout:/dev/vdc :
2018-07-27T18:27:53.036 INFO:teuthology.orchestra.run.target149202181003.stdout: /dev/vdc5 other, ext4
2018-07-27T18:27:53.036 INFO:teuthology.orchestra.run.target149202181003.stdout:/dev/vdd :
2018-07-27T18:27:53.036 INFO:teuthology.orchestra.run.target149202181003.stdout: /dev/vdd5 other, ext4
2018-07-27T18:27:53.037 INFO:teuthology.orchestra.run.target149202181003.stdout:/dev/vde other, unknown
2018-07-27T18:27:53.037 INFO:teuthology.orchestra.run.target149202181003.stderr:/usr/lib/python3.6/site-packages/ceph_disk/main.py:5684: UserWarning:
2018-07-27T18:27:53.037 INFO:teuthology.orchestra.run.target149202181003.stderr:ceph-disk is deprecated and will be removed in the next major version
2018-07-27T18:27:53.037 INFO:teuthology.orchestra.run.target149202181003.stderr:of Ceph (Nautilus). The replacement tool is called ceph-volume. For
2018-07-27T18:27:53.037 INFO:teuthology.orchestra.run.target149202181003.stderr:details see http://docs.ceph.com/docs/master/ceph-volume/#migrating
2018-07-27T18:27:53.038 INFO:teuthology.orchestra.run.target149202181003.stderr:
2018-07-27T18:27:53.038 INFO:teuthology.orchestra.run.target149202181003.stderr:  warnings.warn(DEPRECATION_WARNING)
2018-07-27T18:27:53.068 INFO:teuthology.orchestra.run.target149202181003.stderr:+ ceph osd tree
2018-07-27T18:27:53.422 INFO:teuthology.orchestra.run.target149202181003.stdout:ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
2018-07-27T18:27:53.422 INFO:teuthology.orchestra.run.target149202181003.stdout:-1            0 root default

Logs: http://137.74.25.20:8081/ubuntu-2018-07-27_18:05:26-suse:smoke:health-ok-wip-qa-osd-config---basic-openstack/102/

smithfarm avatar Jul 27 '18 18:07 smithfarm

I have not tried all 20 profiles in qa/osd-config/ovh, but so far I've run around 5-10 of them at random and they all exhibit this same failure.

smithfarm avatar Jul 28 '18 18:07 smithfarm

I have not tried all 20 profiles in qa/osd-config/ovh, but so far I've run around 5-10 of them at random and they all exhibit this same failure.

Now, we need a way to confirm which configurations cause problems and work on a solution. We actually need a way to sequentially run all configurations.

jschmid1 avatar Jul 31 '18 09:07 jschmid1

@jschmid1 Can't run them sequentially, but I can run them in parallel :-)

smithfarm avatar Jul 31 '18 09:07 smithfarm

That's even better @smithfarm :)

jschmid1 avatar Jul 31 '18 09:07 jschmid1