dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

With SSD Tiering, Dragonfly process RSS exceeds configured maxmemory (540 GB) → OOM

Open shahyash2609 opened this issue 4 months ago • 7 comments

Describe the bug When running Dragonfly with SSD tiering and --maxmemory=540G, the Dragonfly process RSS (resident memory) grows beyond the configured maxmemory during bulk load (and subsequent queries). On my host (840 GB RAM), RSS rises well above 540 GB (e.g., ~700 GB), and the process is eventually killed by OOM, even though a 7–9 TB NVMe SSD is available for tiered storage.

To Reproduce

  1. Start Dragonfly with SSD tiering:

    /home/ubuntu/dragonfly --logtostderr --cache_mode=false --tiered_experimental_cooling=false \
      --dbnum=1 --port=6379 --logbuflevel=-1 --conn_use_incoming_cpu=true \
      --maxmemory=540G --masterauth=${DF_PASSWORD} --requirepass=${DF_PASSWORD} \
      --break_replication_on_master_restart=true \
      --tiered_offload_threshold=0.25 \
      --tiered_prefix /mnt/localDiskSSD/dfssd \
      --dir=/mnt/localDiskSSD/backup \
      --cluster_mode=emulated --lock_on_hashtags --interpreter_per_thread=128
    
  2. Populate data:

    redis-cli DEBUG POPULATE 500000000 key 4096
    
  3. Observe process RSS vs. maxmemory during/after load:

    PID=$(pidof dragonfly)
    ps -o pid,rss,vsz,cmd -p $PID
    grep -E 'VmRSS|VmSize' /proc/$PID/status
    
    # (optional) compare with Dragonfly’s internal counters
    redis-cli INFO MEMORY | egrep "used_memory_human|maxmemory_human"
    
  4. After crossing the threshold, RSS grows beyond maxmemory and the service is terminated by OOM.

Expected behavior With SSD tiering enabled, the Dragonfly process RSS should remain at or below the configured maxmemory (allowing for reasonable overhead), offloading eligible values to SSD to avoid OOM.

Actual behavior Process RSS exceeds maxmemory by a large margin during/after loading, leading to OOM despite ample free SSD capacity at --tiered_prefix. (In my runs, used_memory also floats above the cap.)

Screenshots / Logs

  • Please attach:

    • Output of:

    • INFO MEMORY around the event:

      redis-cli INFO MEMORY | egrep "used_memory|maxmemory"
      # Memory
      used_memory:521838494624
      used_memory_human:486.00GiB
      used_memory_peak:522897895552
      used_memory_peak_human:486.99GiB
      fibers_stack_vms:59866960
      fibers_count:919
      used_memory_rss:715525791744
      used_memory_rss_human:666.38GiB
      used_memory_peak_rss:522897895552
      maxmemory:579820584960
      maxmemory_human:540.00GiB
      used_memory_lua:0
      object_used_memory:481883549696
      type_used_memory_string:481883549696
      table_used_memory:33876148144
      prime_capacity:880802160
      expire_capacity:107520
      num_entries:509999989
      inline_keys:509999988
      small_string_bytes:0
      pipeline_cache_bytes:0
      dispatch_queue_bytes:0
      dispatch_queue_subscriber_bytes:0
      dispatch_queue_peak_bytes:306642
      client_read_buffer_peak_bytes:16414208
      tls_bytes:26056
      snapshot_serialization_bytes:0
      commands_squashing_replies_bytes:0
      psync_buffer_size:0
      psync_buffer_bytes:0
      cache_mode:store
      maxmemory_policy:noeviction
      replication_streaming_buffer_bytes:0
      replication_full_sync_buffer_bytes:0
      
    • Directory size at the tiered path:

      du -sh /mnt/localDiskSSD/dfssd
      2.9T	/mnt/localDiskSSD
      

Environment

  • OS: Ubuntu 20.04

  • Kernel: 6.5.0-1018-gcp 18-Ubuntu

  • Dragonfly Version: 1.32.0

  • Hardware:

    • RAM: 840 GB
    • SSD for tiering: local NVMe (LSSD), ~9 TB available

Reproducible Code Snippet

# 1) Start DF as above (tiering + 540G maxmemory)
# 2) Populate with 500M keys of ~4KB values
redis-cli DEBUG POPULATE 500000000 key 4096

# 3) Observe process RSS vs cap
PID=$(pidof dragonfly)
watch -n1 "ps -o pid,rss,vsz,cmd -p $PID; echo ----; redis-cli INFO MEMORY | egrep 'used_memory_human|maxmemory_human'"

Additional context

  • Total dataset: ~1.7 TB (actual).
  • Tiered path: /mnt/localDiskSSD/dfssd on a 7–9 TB NVMe; disk IOPS/bandwidth were idle/available.
  • Despite tiering being configured, RSS rises above the 540 GB cap and the process OOMs.
  • Post FlushALL command, data didn't get removed from the disk!

shahyash2609 avatar Aug 10 '25 16:08 shahyash2609

@romange, could you please review this and advise on any workarounds that would help us avoid the problem?

shahyash2609 avatar Aug 18 '25 05:08 shahyash2609

I am sorry we are busy with other tasks.

romange avatar Aug 18 '25 06:08 romange

I am sorry we are busy with other tasks.

thanks for letting us know. We'll keep an eye on this issue as it's of high interest to us.

Tieger avatar Oct 11 '25 09:10 Tieger

@romange are you planning to come back to this bug soon? It makes the feature useless for us, as our goal with SSD tiering is to cap max memory while letting the store grow on disk.

ardigan6 avatar Nov 19 '25 14:11 ardigan6

what was the instance type on GCP? can you please repeat this experiment with version v1.35?

romange avatar Nov 19 '25 15:11 romange

We fixed a few small bugs and added backpressure for writes, so the instance will be throttling them when reaching memory limits.

dranikpg avatar Nov 19 '25 15:11 dranikpg

Please also set tiered_storage_write_depth to a few thousands at least (max concurrent writes setting). Also, DEBUG POPULATE is not optimized to work with tiering, as it uses a limited number of concurrent writes, so the total write performance will be quite low

dranikpg avatar Nov 19 '25 15:11 dranikpg