zfs
zfs copied to clipboard
ashift=18 needed for NVMe with physical block size 256k
zfs 2.1.4 on Proxmox 7.2 ( 5.15.30-2-pve) -->
Describe the problem you're observing
Is it correct I should use ashift=18 for these drives to run them as mirror? 16 seemsto be the highest number
Describe how to reproduce the problem
root@prox1:~# zpool status
pool: tank
state: ONLINE
status: One or more devices are configured to use a non-native block size.
Expect reduced performance.
action: Replace affected devices with devices that support the
configured block size, or migrate data to a properly configured
pool.
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme0n1p1 ONLINE 0 0 0 block size: 65536B configured, 262144B native
nvme1n1p1 ONLINE 0 0 0 block size: 65536B configured, 262144B native
errors: No known data errors
root@prox1:~# cat /sys/devices/pci0000:5d/0000:5d:00.0/0000:5e:00.0/nvme/nvme0/nvme0n1/queue/physical_block_size
262144
root@prox1:~# cat /sys/devices/pci0000:5d/0000:5d:00.0/0000:5e:00.0/nvme/nvme0/nvme0n1/queue/logical_block_size
512
root@prox1:~# nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 22103782**** Micron_7450_MTFDKCB3T8TFR 1 1.77 TB / 3.84 TB 512 B + 0 B E2MU110
/dev/nvme1n1 22103783**** Micron_7450_MTFDKCB3T8TFR 1 1.77 TB / 3.84 TB 512 B + 0 B E2MU110
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
this is an example how log text should be marked (wrap it with ```)
-->
You should not wish 256KB ashift. It would be too space inefficient in most cases. Just recently my https://github.com/openzfs/zfs/pull/13798 was merged to master to improve this area. I think it should do the right thing for you,
The highest ashift that the disk format can handle is ashift=17, but going that high would kill the uberblock history and would likely require code changes to work semi-reliably (since the current code might not be able to import the pool following a code boot when ashift=17 is used because it relies on the uberblock history existing). Doing ashift=18 would require a disk format change.
As @amotin said, I do not think that ashift=18 is the answer here.
so what is your suggestion? Use it as is, get new drives or ask the vendor if it is possible to reformat to a smaller block size?
Recreate the pool with ashift=12 and use it that way. The drive is designed to support 4K random IO, although in terms of pure 4K random writes, it is suboptimal:
https://www.storagereview.com/review/micron-7400-pro-ssd-review
If you are not doing random IO, I suggest setting a 1M recordsize so that most data writes will be full physical page writes.
As for redoing the low level formatting, that is not really possible due to how flash works internally. Thankfully, since 4K is such a common IO size, flash drive firmware is designed to handle it in a performant way, despite flash physical page sizes becoming insane.
The issue is solved with firmware E2MU200 from https://www.micron.com/products/ssd/firmware-downloads
With this firmware, the Linux Kernel reports a physical block size of 4.096 for those drives. You can find example outputs regarding the physical block size here: https://www.thomas-krenn.com/de/wiki/NVMe_physical_block_size#Beispiel_einer_NVMe_SSD (currently German only, English version might follow)