bam
bam copied to clipboard
Segfault when running nvm-block-bench
Describe the bug
I encounter a segfault when running the following command ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=true
gdb shows:
Thread 1 "nvm-block-bench" received signal SIGSEGV, Segmentation fault.
0x00007ffff7fa1b46 in nvm_cq_poll (cq=0x5555556c47d8) at /root/git/bam/include/nvm_queue.h:168
168 if (!_RB(*NVM_CPL_STATUS(cpl), 0, 0) != !cq->phase)
It looks like the issue originates from linux/device.cpp/nvm_ctrl_init()
when allocating ctrl->mm_ptr
with mmap
.
This memory can apparently not be accessed.
To Reproduce
- Build following instructions in readme
- Unbind nvme device
echo -n "0000:03:00.0" > /sys/bus/pci/drivers/nvme/unbind
- Load kernel module
cd build/module && make load
- Run benchmark
cd build && ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=true
Expected behavior I expected the benchmark to run without segfault
Machine Setup (please complete the following information): I have tried to match the ASPLOS setup
- Ubuntu 20.04
- NVIDIA Driver 470.256.02, CUDA 11.6, NVIDIA V100 16GB
- Intel Optane 16gb (mempek1w016ga)
Additional context
I'm trying to reproduce the benchmarks from ASPLOS, so I'm on the asplos
tag and using the same software versions.
I have compared my build logs to those in asplosaoe/
and they match.
I have two V100s in the server and have verified with p2pBandwidthLatencyTest
from cuda-samples that they support P2P, which I assume means that they can access the NVMe drive via P2P.
For now, I'm trying to get a single V100 to work with a single NVMe drive.