How to see the physical space of a disk?
I set up VDO on the disk and want to check the actual disk usage when deduplication is turned off and on What command should I use? sudo vdostats --hu? This should only be the size in VDO
Hi @Baimax-123, yes. Using the vdostats utility will provide you with a df-style output that shows the physical usage of the volume.
Here is some output to show an example.
[root@localhost ~]# vdo create --name vdo0 --device /dev/sda --vdoLogicalSize 1T
Creating VDO vdo0
The VDO volume can address 12 GB in 6 data slabs, each 2 GB.
It can grow to address at most 16 TB of physical storage in 8192 slabs.
If a larger maximum size might be needed, use bigger slabs.
Starting VDO vdo0
Starting compression on VDO vdo0
VDO instance 0 volume is ready at /dev/mapper/vdo0
[root@localhost ~]# mkfs.xfs -K /dev/mapper/vdo0
meta-data=/dev/mapper/vdo0 isize=512 agcount=4, agsize=67108864 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0
data = bsize=4096 blocks=268435456, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=131072, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@localhost ~]# mkdir /mnt/vdo
[root@localhost ~]# mount /dev/mapper/vdo0 /mnt/vdo
[root@localhost ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 7:0 0 15G 0 disk
└─vdo0 252:0 0 1T 0 vdo /mnt/vdo
vda 253:0 0 20G 0 disk
└─vda1 253:1 0 20G 0 part /
# Note that the starting values show 7.2G physical used and 3G used on the
# filesystem.
[root@localhost ~]# df -h /mnt/vdo
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vdo0 1.0T 7.2G 1017G 1% /mnt/vdo
[root@localhost ~]# vdostats --human-readable
Device Size Used Available Use% Space saving%
/dev/mapper/vdo0 15.0G 3.0G 12.0G 20% 99%
# Write 1G of unique data and see both values increase by 1G.
[root@localhost ~]# dd if=/dev/urandom of=/mnt/vdo/1G-file bs=1M count=1024 oflag=direct status=progress
1072693248 bytes (1.1 GB, 1023 MiB) copied, 24 s, 44.7 MB/s
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 24.0207 s, 44.7 MB/s
[root@localhost ~]# df -h /mnt/vdo
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vdo0 1.0T 8.2G 1016G 1% /mnt/vdo
[root@localhost ~]# vdostats --human-readable
Device Size Used Available Use% Space saving%
/dev/mapper/vdo0 15.0G 4.0G 11.0G 26% 33%
# Duplicate that data and see only the df (logical used) increase
[root@localhost ~]# cp -a /mnt/vdo/1G-file /mnt/vdo/1G-file-copied
[root@localhost ~]# sync
[root@localhost ~]# df -h /mnt/vdo
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vdo0 1.0T 9.2G 1015G 1% /mnt/vdo
[root@localhost ~]# vdostats --human-readable
Device Size Used Available Use% Space saving%
/dev/mapper/vdo0 15.0G 4.0G 11.0G 26% 59%
Hi @Baimax-123, What are you specifically trying to find out? Using df, and vdostats are going to give you different metrics than smartctl. If you're just trying to understand usage information (i.e. How full is the device and how much more can I write to it?), then you should be using df, and vdostats for that information. But if you're instead interested in understanding wear leveling or something along those lines, then smartctl would be a way to go about getting that information. I don't think using smartctl to measure usage is going to tell you much other than that a read or write operation has happened on the device.
Hi @rhawalsh, I have a SSD with compression, which can give the physical usage of the disk.
However, this result is far from that given by df or vdostats .
So maybe there is a gap between the actual physical usage in the standard disk and the above commands?
Hi @Baimax-123, I did not realize your SSD was doing compression as well. I would suggest that if you want to compare realistic numbers, it's probably better to look at the df, and vdostats outputs without any human readable numbers, since those are going to be rounded to the nearest GiB (or so...).
As you can tell the state of compression and/or deduplication affects the amount of data that actually gets cycled through the device. It will never be 1:1 because of the need to write out metadata, journal information for recovery, etc. Depending on the workload the ratio will vary up or down.
Hi @rhawalsh , thanks, I will try it with your suggestion. And hope find some useful conclusions. Bast.
Hi @Baimax-123, Please feel free to ask any questions you might have along the way!
I also intended to mention that inspecting the output of vdostats --verbose may give you some additional clues to the amount of data being written to the underlying storage. Typically you should be able to look at statistics that mention 'write', 'flush', and/or 'fua' to help tie things together.
Hi, @rhawalsh
Where can I find a detailed explanation of the vdostats -- verbose command output project?
Some data may be useful, but I can't understand its actual meaning
For example: BIOS meta completed write
Hi, @rhawalsh, I use the FIO tool for random writing and iostat to monitor the VDO volume and hard disk at the same time. The VDO volume only has write BW, but there are both read and write BW in the hard disk (the approximate data is as follows: the read BW is the same as the write BW of the VDO volume, and the write BW is twice the write BW of the VDO volume). Can you roughly describe the role of this additional bandwidth introduction? Thanks.
As mentioned in the following table, VDO volumes are built directly on the hard disk. The FIO write command is: sudo fio -filename=/dev/mapper/vdo_2 --bs=4k --output write4k.log --direct=1 --iodepth=128 --rw=randwrite --ioengine=libaio --buffer_compress_percentage=54 --buffer_compress_chunk=4096 --offset=0 --size=100% --runtime=50000s --time_based=1 --group_reporting --numjobs=4 vdo_2 is the name.
Hi @Baimax-123, I apologize for the delayed response.
To get some information about the output from vdostats --verbose, I'd point you at the RHEL docs, specifically "Table 30.9. vdostats --verbose Output" if the anchor doesn't put you there initially.
The IO for VDO involves doing read-compares when we encounter duplicate data. So if the block comes in, VDO hashes it and sends to UDS for advice, and UDS claims that it is a duplicate and likely at a particular block, the VDO device will then go read that block to make sure that it actually is a duplicate. In the event that it's actually not a duplicate, the VDO device can then write it out as it normally would with a unique block. So it is for reasons like this that you're seeing a bunch of read traffic, despite a purely write workload.
Please keep in mind that my description of the IO pattern is generalized. If you want/need more detail then you could ask for more details and I can try to get someone who is more knowledgeable than I am to provide better information. Of course you're always free to browse the code yourself as well, but that might be more work than its worth.
Thanks, @rhawalsh. I have see that. Meanwhile, there is another question: VDO has the functions of Deduplication and Compression. You can know from the function name that the read-compares should exist. But if you turn off Deduplication and Compression, will the read-compares still be exist?
And I will browse the code and hope to learn more about VDO. The amount of VDO code is still quite large. If I want to see the specific operation from VDO layer to physical storage layer, can you help me point out where to start? I believe it will save a lot of time. Thanks, again!