container-storage-setup icon indicating copy to clipboard operation
container-storage-setup copied to clipboard

docker storage limited to 2TB by sfdisk

Open jeremyeder opened this issue 9 years ago • 9 comments

I tried to use docker-storage-setup on a disk that was larger than 2TB:

# lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda              8:0    0 558.4G  0 disk 
├─sda1           8:1    0     1G  0 part /boot
└─sda2           8:2    0 557.4G  0 part 
  ├─rhel7-root 253:0    0 556.3G  0 lvm  /
  └─rhel7-swap 253:1    0     1G  0 lvm  [SWAP]
sdb              8:16   0   2.2T  0 disk 
root@bkr-hv02: ~ # systemctl start docker-storage-setup
Job for docker-storage-setup.service failed because the control process exited with error code. See "systemctl status docker-storage-setup.service" and "journalctl -xe" for details.

It failed to start and threw an error that comes from the sfdisk utility used to create partitions.

root@bkr-hv02: ~ # systemctl status docker-storage-setup -l
● docker-storage-setup.service - Docker Storage Setup
   Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2016-04-06 11:36:58 EDT; 7s ago
  Process: 61143 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
 Main PID: 61143 (code=exited, status=1/FAILURE)

Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61143]: /dev/sdb4             0         -          0   0  Empty
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61143]: Warning: partition 1 has size 2.4 TB (2398201315328 bytes),
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61143]: which is larger than the 2199023255040 bytes limit imposed
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61143]: by the DOS partition table for 512-byte sectors
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61143]: sfdisk: I don't like these partitions - nothing changed.
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61143]: (If you really want this, use the --force option.)
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com systemd[1]: Failed to start Docker Storage Setup.
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com systemd[1]: Unit docker-storage-setup.service entered failed state.
Apr 06 11:36:58 bkr-hv02.lab.eng.rdu.redhat.com systemd[1]: docker-storage-setup.service failed.

As suggested in the error, I added --force to /usr/bin/docker-storage-setup:

# diff -pruN docker-storage-setup /usr/bin/docker-storage-setup 
--- docker-storage-setup        2016-04-06 11:41:44.519253366 -0400
+++ /usr/bin/docker-storage-setup       2016-04-06 11:41:49.431230315 -0400
@@ -568,7 +568,7 @@ create_disk_partitions() {
     #   * Error handling when partition(s) already exist
     #   * Deal with loop/nbd device names. See growpart code
     size=$(( $( awk "\$4 ~ /"$( basename $dev )"/ { print \$3 }" /proc/partitions ) * 2 - 2048 ))
-    cat <<EOF | sfdisk $dev
+    cat <<EOF | sfdisk --force $dev
 unit: sectors

 ${dev}1 : start=     2048, size=  ${size}, Id=8e

And then it was able to create the partition:

# systemctl status docker-storage-setup -l
● docker-storage-setup.service - Docker Storage Setup
   Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Apr 06 11:42:07 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: Volume group "docker_vg" successfully created
Apr 06 11:42:07 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: Rounding up size to full physical extent 192.00 MiB
Apr 06 11:42:07 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: Wiping xfs signature on /dev/docker_vg/docker-poolmeta.
Apr 06 11:42:07 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: Logical volume "docker-poolmeta" created.
Apr 06 11:42:08 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: Logical volume "docker-pool" created.
Apr 06 11:42:08 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: WARNING: Converting logical volume docker_vg/docker-pool and docker_vg/docker-poolmeta to pool's data and metadata volumes.
Apr 06 11:42:08 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Apr 06 11:42:08 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: Converted docker_vg/docker-pool to thin pool.
Apr 06 11:42:08 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[61555]: Logical volume "docker-pool" changed.
Apr 06 11:42:08 bkr-hv02.lab.eng.rdu.redhat.com systemd[1]: Started Docker Storage Setup.
# lsblk
NAME                             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                8:0    0 558.4G  0 disk 
├─sda1                             8:1    0     1G  0 part /boot
└─sda2                             8:2    0 557.4G  0 part 
  ├─rhel7-root                   253:0    0 556.3G  0 lvm  /
  └─rhel7-swap                   253:1    0     1G  0 lvm  [SWAP]
sdb                                8:16   0   2.2T  0 disk 
└─sdb1                             8:17   0 185.5G  0 part 
  ├─docker_vg-docker--pool_tmeta 253:2    0   192M  0 lvm  
  │ └─docker_vg-docker--pool     253:4    0  74.1G  0 lvm  
  └─docker_vg-docker--pool_tdata 253:3    0  74.1G  0 lvm  
    └─docker_vg-docker--pool     253:4    0  74.1G  0 lvm  

jeremyeder avatar Apr 06 '16 15:04 jeremyeder

If we defaulted to --force, would this cause issues? IE other badly confined systems errors being ignored?

rhatdan avatar Apr 06 '16 18:04 rhatdan

I have no idea what issues will be caused if we use --force. I am wondering should we switch to "parted" instead of sfdisk or use a different type of partition table or something else which allows partitions bigger than 2TB.

rhvgoyal avatar Apr 06 '16 18:04 rhvgoyal

So this seems to come from MBR as there maximum partition size can be 2TB. (for 512 byte sector). Should we consider using GPT. I am not sure if there are any issues with usage of GPT.

rhvgoyal avatar Apr 06 '16 19:04 rhvgoyal

--force seems to have me hitting this:

Apr 06 15:17:22 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[3708]: INFO: Waiting for device /dev/mapper/docker_vg-docker--pool to be available. Wait time remaining is 60 seconds
Apr 06 15:17:27 bkr-hv02.lab.eng.rdu.redhat.com docker-storage-setup[3708]: INFO: Waiting for device /dev/mapper/docker_vg-docker--pool to be available. Wait time remaining is 55 seconds

The partition, pv, and vg look OK, but the lv never shows up and so it sits there waiting until it times out (60 seconds).

jeremyeder avatar Apr 06 '16 19:04 jeremyeder

I can't think of a reason not to use GPT.

Related to that, one thing that might be interesting is to allocate a GUID in https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/ - basically "if the GUID is X, automatically format it as a PV and add it to the VG backing / ? Might make it slightly easier for admins to pre-provision disks.

cgwalters avatar Apr 06 '16 19:04 cgwalters

@cgwalters Why do we need to partition the disk at all? Why can't we add it directly to volume group.

rhvgoyal avatar Apr 06 '16 19:04 rhvgoyal

As for why we use partitions at all...I suspect it was done for the cloud case where we have one disk that gets magically expanded.

I also can't think of a reason not to skip partitions for raw disks.

cgwalters avatar Apr 06 '16 19:04 cgwalters

Actually even with raw disks, we might add it to root volume group (if VG= was not specified) and over next reboot we might have to grow that partition using growpart.

Right now we are assuming that every pv in root volume group is partitioned and I guess that's the reason we are partitioning disks before we add them to volume group.

rhvgoyal avatar Apr 06 '16 19:04 rhvgoyal

"raw" but non-virtual disks will never grow right? I am not sure we need to support a scenario where secondary virtual disks are magically grown. A virt user can just as easily add a new disk.

It is however critically important to support adding partitions inside the root (first) disk for the IaaS case on first boot.

cgwalters avatar Apr 06 '16 19:04 cgwalters