CacheLib qDepth Support for NVM Cache

Hi,

I have been exploring using cachelib for an experiment, where I am using 1 cachebench thread to send requests to an underlying SSD (using nvmCache) for a certain qDepth. However, despite setting navyQDepth in my configuration file, I am observing that cachebench never goes beyond 1 in-flight request.

Here's my configuration file:

{
  "cache_config" : {
    "cacheSizeMB" : 100,
    "numPools" : 1,
    "nvmCacheSizeMB": 40960,
    "nvmCachePaths": ["/mnt/nvme0n1/cachelib/testfile"],
    "navyQDepth": 8,
    "navyEnableIoUring": false,
    "navyBlockSize": 4096
  },
  "test_config" : {
    "enableLookaside": true,
    "numThreads" : 1,
    "numKeys" : 1000000,
    "numOps" : 5000000,
    "distribution" : "range",
    "generator": "workload",
    "keySizeRange" : [16, 16],
    "keySizeRangeProbability" : [1],
    "valSizeRange" : [128, 128],
    "valSizeRangeProbability" : [1],
    "setRatio" : 0.0,
    "delRatio" : 0.0,
    "loneGetRatio" : 0.0,
    "getRatio" : 1
  }
}

I have also performed some logging within the submitIo function in cachelib/navy/common/Device.cpp file to see the number of inflight requests. Adding those here for clarity:

I0405 00:07:43.816907 305415 Device.cpp:839] [ctx_0] Submit I/O Pre Submission queue depth: 8 outstanding requests: 0; submitted requests: 162
I0405 00:07:43.816933 305415 Device.cpp:863] [ctx_0] Submit I/O Post Submission: queue depth: 8 outstanding requests: 1; submitted requests: 163
I0405 00:07:43.817930 305415 Device.cpp:799] [ctx_0] Handle Completion: outstanding requests: 1; completed requests: 164
I0405 00:07:43.817938 305415 Device.cpp:839] [ctx_0] Submit I/O Pre Submission queue depth: 8 outstanding requests: 0; submitted requests: 165
I0405 00:07:43.817968 305415 Device.cpp:863] [ctx_0] Submit I/O Post Submission: queue depth: 8 outstanding requests: 1; submitted requests: 166
I0405 00:07:43.818650 305415 Device.cpp:799] [ctx_0] Handle Completion: outstanding requests: 0; completed requests: 167

Any help would be greatly appreciated!

Apr 05 '24 16:04 Alphacode18

Hi @Alphacode18

In order to use concurrent IOs, you need to enable async multitasking at Navy layer (navy-async) by enabling NavyRequestScheduler; refer to this and this for details. Note that qdepth will be set automatically if you enable navy-async (ref).

If you override the qdepth with some value >1 without enabling navy-async, you would have hit this assertion in debug build.

Apr 05 '24 17:04 jaesoo-fb

Hi @jaesoo-fb

Thank you for your prompt response. I see. I am now able to set qDepth automatically by varying the navyMaxNumReads and navyMaxNumWrites, with navyReaderThreads and navyWriterThreads set to 1.

Would you know if NavyRequestScheduler is a parameter I can set using the config file? Maybe I am missing something, but I can't seem to set it correctly.

Thank you for all your help!

Apr 05 '24 18:04 Alphacode18

@Alphacode18 NavyRequestScheduler (async) as opposed to OrderedThreadPoolScheduler is activated if you provide non-zero values for navyMaxNumReads and navyMaxNumWrites. See this

Apr 05 '24 18:04 jaesoo-fb

Oh I see! Thanks. Unfortunately, in submitIo, I still see numOutstanding_ oscillate between 0 and 1 (similar to the log above). Is that the correct place to log? If not, could you point me to the right place? I just want to verify that I am indeed seeing numOutstanding_ roughly similar to qDepth.

Here's my updated config:

{
  "cache_config" : {
    "cacheSizeMB" : 100,
    "numPools" : 1,
    "nvmCacheSizeMB": 40960,
    "nvmCachePaths": ["/mnt/nvme0n1/cachelib/testfile"],
    "navyEnableIoUring": false,
    "navyBlockSize": 4096,
    "navyMaxNumReads": 16,
    "navyMaxNumWrites": 16,
    "navyReaderThreads": 1,
    "navyWriterThreads": 1
  },
  "test_config" : {
    "enableLookaside": true,
    "numThreads" : 1,
    "numKeys" : 1000000,
    "numOps" : 5000000,
    "distribution" : "range",
    "generator": "workload",
    "keySizeRange" : [16, 16],
    "keySizeRangeProbability" : [1],
    "valSizeRange" : [4096, 4096],
    "valSizeRangeProbability" : [1],
    "setRatio" : 0.0,
    "delRatio" : 0.0,
    "loneGetRatio" : 0.0,
    "getRatio" : 1
  }
}

Apr 05 '24 18:04 Alphacode18

@Alphacode18 Navy configuration looks correct, but stressor configuration looks not; you are using only 1 thread (numThreads), meaning there will only be at most one outstanding cachelib requests.

Apr 05 '24 19:04 jaesoo-fb

Oh, I see. In this case, may I ask three questions:

How do I set qDepth independent to the number of threads (and have cachelib enforce that, i.e., maintain those many requests in-flight per thread?)
I observe now that qDepth is always upper-bounded by the numThreads (i.e., I can't have I/O depth 8 if numThreads = 1), how can I get rid of this upper bound?
Also, If I launch 8 threads with my configuration, how can I explicitly allot them particular CPU cores?

Thank you for all your help!

Apr 05 '24 19:04 Alphacode18

CacheLib CacheLib copied to clipboard

qDepth Support for NVM Cache

CacheLib
CacheLib copied to clipboard