[Feature] Add option to pass `mode` option to python Kvikio.Cufile for when bypassing CuFile for POSIX directly.
It would be useful to add the ability for users to define when in compatibility mode whether the python Kvikio.Cufile read API uses the O_DIRECT flag or not with POSIX. For users wanting to evaluate the performance between kvikio's CuFile and POSIX functionality caching might effect performance - especially when trying to evaluate performance on an average of multiple runs ran on the same filepath. This will give users greater control and confidence in their benchmark results.
From https://developer.nvidia.com/blog/gpudirect-storage/?ncid=no-ncid
Currently "when in compatibility mode, FileHandle only opens the file once (without the O_DIRECT flag)". https://github.com/rapidsai/kvikio/pull/410#issue-2424383037
Thanks for raising the issue. Support for O_DIRECT for the compatibility mode is being considered. Implementation is likely to take non-trivial code changes: O_DIRECT usually imposes stringent alignment requirements where the user-provided buffer, I/O segment length, and file offset have to be aligned to a platform-dependent value. Granted, for simplicity we may choose to overalign everything to the page size, we still need to cover all the cases where one or more of the 3 components mentioned above are unaligned.
If your purpose is to benchmark, I'd recommend using the new kvikio.cufile.clear_page_cache feature available in 25.08 nightly. Calling this function prior to each run of your benchmark program is expected to eliminate the impact of kernel page cache. If 25.08 nightly is not a viable option, then use these Linux commands that have the same cache clearing effect with the new API:
sync
sudo /sbin/sysctl vm.drop_caches=3
./my_benchmark_program
Thank you to pointing me to this upcoming feature available in the 25.08 nightly. Unfortunately my computing resource does not allow conda/mamba and I do not see 25.08 available through PyPI.
(kvikio-venv) [strugf@gw00 ~]$ pip install kvikio-cu12==25.8.0
ERROR: Could not find a version that satisfies the requirement kvikio-cu12==25.8.0 (from versions: 24.8.2, 24.10.0, 24.12.0, 24.12.1, 25.2.0, 25.2.1, 25.4.0, 25.6.0)
ERROR: No matching distribution found for kvikio-cu12==25.8.0
Looking forward to testing this out when 25.8.0 is stable.
https://pypi.anaconda.org/rapidsai-wheels-nightly/simple contains nightly wheels.
pip install "kvikio-cu12>=25.8.0,<25.10.0a0"
You might need --prerelease=allow.
@TomAugspurger I see kvikio-cu11 but not kvikio-cu12. I do see libkvikio-cu12 though. Can I expect kvikio-cu11 nightly to be fine in cuda 12?
kvikio-cu12 is at https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/kvikio-cu12. I think the index page at https://pypi.anaconda.org/rapidsai-wheels-nightly/simple comes from anaconda.org's implementation of https://packaging.python.org/en/latest/specifications/simple-repository-api/. I'm not sure if it missing some packages from the HTML page is expected, but they are there.
#863 has mostly addressed this issue, with the implication that the I/O in general will consist of a medley of buffered (non-O_DIRECT) and unbuffered (O_DIRECT) parts. If however you can ensure that the buffer address, transfer size, file offset, and KvikIO task size all aligned to page size, then pure O_DIRECT will be performed, making your test free from the page cache effect.
There is actually another, simple way to perform unbuffered I/O even without the above new feature: after you have opened the file, manipulate the file descriptor and add the O_DIRECT flag. Example:
import kvikio
import kvikio.defaults
import numpy as np
import fcntl
import os
import mmap
kvikio.defaults.set("compat_mode", True)
# Read the file into a buffer
with kvikio.CuFile(file_name, "r") as file_handle:
# Enable Direct I/O posterior
current_flags = fcntl.fcntl(file_handle.fileno(), fcntl.F_GETFL)
fcntl.fcntl(file_handle.fileno(), fcntl.F_SETFL,
current_flags | os.O_DIRECT)
# Direct I/O is now being used. Alignment requirement must be met
# num_bytes = file_handle.read(unaligned_arr, 123, 456) # POSIX error on pread: Invalid argument
aligned_buf = mmap.mmap(-1, 4096)
aligned_arr = np.frombuffer(aligned_buf, dtype=np.uint8)
num_bytes = file_handle.read(aligned_arr, 4096, 0)
Hope this helps.