trident icon indicating copy to clipboard operation
trident copied to clipboard

Volume size problem (metadata write failed, insufficient space on volume)

Open cfillot opened this issue 1 year ago • 8 comments

Describe the bug

When creating (and using) PVC based on NVMe/TCP backends, I get errors or alerts on a regular basis from the Netapp storage arrays like these:

Message: wafl.vol.full: Insufficient space on volume trident_pvc_d393164d_b0f6_4f4c_acf1_c0bf425fa537@vserver:fed5ad71-27a2-11ed-a1c6-d039ea4eed91 to perform operation. 4.00KB was requested but only 1.00KB was available.
Message: fp.est.scan.catalog.failed: Volume footprint estimator scan catalog update fails on "trident_pvc_cfee9e11_16ef_4bd4_8431_96f5e180b613" - Write to metafile failed.

I can manually resize volumes directly on the storage arrays to avoid these errors but of course it does not scale. As far as I understand it, 10% of additional space is added to the volume size for metadata, but it seems that this is not enough.

NVMe backends are configured as follows:

---
apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
  name: storage1-c-nvme
spec:
  version: 1
  backendName: storage1-c-nvme
  storageDriverName: ontap-san
  managementLIF: XXX.XXX.XXX.XXX
  useREST: true
  svm: Virt_c
  sanType: nvme
  storagePrefix: trident
  defaults:
    spaceReserve: volume
    snapshotReserve: '0'
  credentials:
    name: netapp-credentials
  supportedTopologies:
    - topology.kubernetes.io/zone: ZONE-A

Storage Class:

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: netapp-nvme-c
provisioner: csi.trident.netapp.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: True
parameters:
  fsType: "ext4"
allowedTopologies:
  - matchLabelExpressions:
    - key: topology.kubernetes.io/zone
      values:
        - ZONE-A

PVC:

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-san
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 300Mi
  storageClassName: netapp-nvme-c

BTW, if I don't specify "spaceReserve: volume" in backends, filling the EXT4 filesystem in a Pod makes the volume Read-Only (whereas this should only give a "Not enough space on device").

Environment

  • Trident version: 24.02
  • Kubernetes version: 1.28.4
  • OS: Debian 12
  • NetApp backend types: ONTAP 9.13.1P6 on AFF-A250

cfillot avatar May 01 '24 10:05 cfillot

Just for understanding, are any snapshots taken (either in K8s or directly in Ontap)?

In any case, the snapshotReserve parameter of the backend can be used to make the hosting volume a bit larger than the PVC itself to accommodate for snapshots (hence the name) but also any other metadata etc. You currently have it set to 0%, try setting it to 10 or 20 (unless you make heavy use of snapshots/clones along with high data change rates, then you might need more). Note that this is a percentage calculated with the Ontap volume size as the base.

wonderland avatar May 02 '24 06:05 wonderland

Thanks a lot for your suggestion! No, I don't use snapshots at all on these PVC/volumes, so I chose to set 0%. I didn't know this snapshot reserve space could also be used for metadata, I'll try that and let you know.

cfillot avatar May 02 '24 10:05 cfillot

We have the same setup except we use ubuntu 22.04 and I can confirm we have the same problem. For us , the 100% way to trigger it is to fill the drive, then delete the data and try to fill again.

We have a reproducer using FIO random write of a 16g file to a 20G volume.

The problem happens on overwrite. We can see same bahaviour using iSCSI when the volume is not mounted using the discard ext4 option. With discard active, overwriting is working fine for iSCSI. However discard does not work for us at all for NVMEoTCP and after reaching out to NetApp, we were told that it is not yet supported but we should be able to use thick provisionined volumes. We always had spaceAllocation: "true" on the trident backend but did not have the ext4 mount option.

NetApp told us to use thick provisioning, but setting spaceReserve: volume is not enough as it sets thick provisioning only on the volume level but our internal NetApp storage experts told us that in order to reach 100% overwrite capability we need thick provisioning also for the LUN. From a peek into the trident source the LUN is always thin-provisioned. We are changing the parameter manually.

This is currently a blocker for broader production use and we use trident and NetApp only for less important things as we never understood this issue completely.

Also, I should add, that this is all completely without snapshots.

holtakj avatar May 05 '24 15:05 holtakj

Hi @cfillot, could you please provide us the following information to root cause this issue?

  1. Ontap array logs and where exactly you are getting this alert
  2. State of PVC (bound / pending)
  3. More information on volume in Ontap which is causing this issue ( available storage, metadata, settings)

alloydsa avatar Jul 02 '24 04:07 alloydsa

Hello,

I tried again in an updated environment (Kubernetes 1.30.10, Trident 25.02, ONTAP 9.14.1P6), and the problem is still here. The PVC configuration is exactly the same as above.

Right after provisioning (emptry ext4 filesystem), on the storage array:

storage1-c::> df -h -x /vol/trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa/
Filesystem               total       used      avail capacity  Mounted on                 Vserver
/vol/trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa/ 
                         330MB      301MB       28MB      91%  ---                        Virt_c

In the container:

~ $ df -h /data/demo/
Filesystem                Size      Used Available Use% Mounted on
/dev/nvme0n1            264.9M     24.0K    243.9M   0% /data/demo

If I fill the filesystem with "dd if=/dev/urandom of=/data/demo/test.raw":

storage1-c::> df -h -x /vol/trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa/
Filesystem               total       used      avail capacity  Mounted on                 Vserver
/vol/trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa/ 
                         330MB      330MB         0B     100%  ---                        Virt_c

Event log:

storage1-c::> event log show                                                 
Time                Node             Severity      Event
------------------- ---------------- ------------- ---------------------------
3/5/2025 19:06:38   storage1-c-01    ALERT         monitor.volume.full: Volume "trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa@vserver:fed5ad71-27a2-11ed-a1c6-d039ea4eed91" is full (using or reserving 100% of space and 1% of inodes).
3/5/2025 19:06:38   storage1-c-01    ERROR         fp.est.scan.catalog.failed: Volume footprint estimator scan catalog update fails on "trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa" - Write to metafile failed.
3/5/2025 19:06:37   storage1-c-01    ALERT         wafl.vol.full: Insufficient space on volume trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa@vserver:fed5ad71-27a2-11ed-a1c6-d039ea4eed91 to perform operation. 64.0KB was requested but only 1.00KB was available.

Details about the volume:

storage1-c::> volume  show /vol/trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa/ -instance 

                                      Vserver Name: Virt_c
                                       Volume Name: trident_pvc_b06ed499_3b6c_4a1d_8768_51205143d5fa
                                    Aggregate Name: aggr1
     List of Aggregates for FlexGroup Constituents: aggr1
                                   Encryption Type: none
                  List of Nodes Hosting the Volume: storage1-c-01
                                       Volume Size: 330MB
                                Volume Data Set ID: 2194
                         Volume Master Data Set ID: 2163408814
                                      Volume State: online
                                      Volume Style: flex
                             Extended Volume Style: flexvol
                           FlexCache Endpoint Type: none
                            Is Cluster-Mode Volume: true
                             Is Constituent Volume: false
                     Number of Constituent Volumes: -
                                     Export Policy: default
                                           User ID: 0
                                          Group ID: 0
                                    Security Style: unix
                                  UNIX Permissions: ---rwxrwxrwx
                                     Junction Path: -
                              Junction Path Source: -
                                   Junction Active: -
                            Junction Parent Volume: -
                                           Comment: 
                                    Available Size: 0B
                                   Filesystem Size: 330MB
                           Total User-Visible Size: 330MB
                                         Used Size: 330MB
                                   Used Percentage: 100%
              Volume Nearly Full Threshold Percent: 95%
                     Volume Full Threshold Percent: 98%
                                  Maximum Autosize: 396MB
                                  Minimum Autosize: 330MB
                Autosize Grow Threshold Percentage: 85%
              Autosize Shrink Threshold Percentage: 50%
                                     Autosize Mode: off
                          Total User Visible Files: 10029
                           User Visible Files Used: 101
                         Space Guarantee in Effect: true
                               Space SLO in Effect: true
                                         Space SLO: none
                             Space Guarantee Style: volume
                                Fractional Reserve: 100%
                                       Volume Type: RW                         
                 Snapshot Directory Access Enabled: true
                Space Reserved for Snapshot Copies: 0%
                             Snapshot Reserve Used: 0%
                                   Snapshot Policy: none
                                     Creation Time: Wed Mar 05 19:03:36 2025
                                          Language: C.UTF-8
                                      Clone Volume: false
                                         Node name: storage1-c-01
                         Clone Parent Vserver Name: -
                           FlexClone Parent Volume: -
                                     NVFAIL Option: on
                             Volume's NVFAIL State: false
           Force NVFAIL on MetroCluster Switchover: off
                         Is File System Size Fixed: false
                        (DEPRECATED)-Extent Option: off
                     Reserved Space for Overwrites: 280.0MB
                 Primary Space Management Strategy: volume_grow
                          Read Reallocation Option: off
       Naming Scheme for Automatic Snapshot Copies: create_time
                  Inconsistency in the File System: false
                      Is Volume Quiesced (On-Disk): false
                    Is Volume Quiesced (In-Memory): false
         Volume Contains Shared or Compressed Data: true
                 Space Saved by Storage Efficiency: 34.83MB
            Percentage Saved by Storage Efficiency: 10%
Space Saved by Deduplication Along With VBN ZERO Savings: 34.70MB
                 Percentage Saved by Deduplication: 10%
     Unique Data Which Got Shared by Deduplication: 0B
                        Space Saved by Compression: 128KB
             Percentage Space Saved by Compression: 0%
               Volume Size Used by Snapshot Copies: 0B
                                        Block Type: 64-bit
                                  Is Volume Moving: false
                    Flash Pool Caching Eligibility: read-write
     Flash Pool Write Caching Ineligibility Reason: -
                           Constituent Volume Role: -
                             QoS Policy Group Name: -
                    QoS Adaptive Policy Group Name: -
                               Caching Policy Name: -
                   Is Volume Move in Cutover Phase: false
           Number of Snapshot Copies in the Volume: 0
   VBN_BAD may be present in the active filesystem: false
                   Is Volume on a hybrid aggregate: false
                          Total Physical Used Size: 333.4MB
                          Physical Used Percentage: 101%
                                    FlexGroup Name: -                          
                             Is Volume a FlexGroup: false
                                     SnapLock Type: non-snaplock
                             Vserver DR Protection: -
                      Enable or Disable Encryption: false
                               Is Volume Encrypted: false
                                  Encryption State: none
                                 Encryption Key ID: 
                      Encryption Key Creation Time: -
                                       Application: -
                     Is Fenced for Protocol Access: false
                       Protocol Access Fence Owner: -
                                   Is SIDL enabled: off
                             Over Provisioned Size: 0B
                   Available Snapshot Reserve Size: 0B
                                 Logical Used Size: 364.8MB
                           Logical Used Percentage: 111%
                            Logical Available Size: -
            Logical Size Used by Active Filesystem: 364.8MB
                Logical Size Used by All Snapshots: 0B
                           Logical Space Reporting: false
                         Logical Space Enforcement: false
                             Volume Tiering Policy: none
               Performance Tier Inactive User Data: 0B
       Performance Tier Inactive User Data Percent: 0%
Tags to be Associated with Objects Stored on a FabricPool: -
Does the Object Tagging Scanner Need to Run on This Volume: -
                Is File System Analytics Supported: false
     Reason File System Analytics is not Supported: File system analytics is not supported on volumes that contain NVMe namespaces.
                       File System Analytics State: off
               File System Analytics Scan Progress: -
      File System Analytics Files Scanned Progress: -
                 File System Analytics Total Files: -
                           Activity Tracking State: off
                    Is Activity Tracking Supported: false
         Reason Activity Tracking Is Not Supported: Volume activity tracking is not supported on volumes that contain NVMe namespaces.
                                    Is SMBC Master: false
                          Is SMBC Failover Capable: false
                                    SMBC Consensus: -
                             Anti-ransomware State: disabled
                                     Granular data: disabled
                      Enable Snapshot Copy Locking: false
                                       Expiry Time: -
                              ComplianceClock Time: -
          Are Large Size Volumes and Files Enabled: false
     If this Volume is part of a Consistency Group: false

Regards,

Christophe

cfillot avatar Mar 05 '25 19:03 cfillot

@cfillot Could you please create a Netapp support ticket and update all the necessary logs and steps

alloydsa avatar Mar 06 '25 09:03 alloydsa

@alloydsa All necessary information is provided. There is a significant metadata space issue, particularly on NVMEoTCP, where filesystem discard from client is completely non-functional. A similar issue occurs on iSCSI when discard is disabled. However, with discard enabled, functionality appears normal. Without it, the problems outlined in this ticket arise,

holtakj avatar Mar 06 '25 09:03 holtakj

@holtakj @cfillot We could not reproduce this issue on our side by following the steps that were provided. Could you please confirm if you have created a NetApp support case and have attached trident logs and all the other necessary information?

alloydsa avatar Mar 13 '25 05:03 alloydsa