bam Segfault when running nvm-block-bench

Segfault when running nvm-block-bench

Open karlowich opened this issue 4 months ago • 7 comments

Describe the bug I encounter a segfault when running the following command ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=true

gdb shows:

Thread 1 "nvm-block-bench" received signal SIGSEGV, Segmentation fault.
0x00007ffff7fa1b46 in nvm_cq_poll (cq=0x5555556c47d8) at /root/git/bam/include/nvm_queue.h:168
168	    if (!_RB(*NVM_CPL_STATUS(cpl), 0, 0) != !cq->phase)

It looks like the issue originates from linux/device.cpp/nvm_ctrl_init() when allocating ctrl->mm_ptr with mmap. This memory can apparently not be accessed.

To Reproduce

Build following instructions in readme
Unbind nvme device echo -n "0000:03:00.0" > /sys/bus/pci/drivers/nvme/unbind
Load kernel module cd build/module && make load
Run benchmark cd build && ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=true

Expected behavior I expected the benchmark to run without segfault

Machine Setup (please complete the following information): I have tried to match the ASPLOS setup

Ubuntu 20.04
NVIDIA Driver 470.256.02, CUDA 11.6, NVIDIA V100 16GB
Intel Optane 16gb (mempek1w016ga)

Additional context I'm trying to reproduce the benchmarks from ASPLOS, so I'm on the asplos tag and using the same software versions. I have compared my build logs to those in asplosaoe/ and they match. I have two V100s in the server and have verified with p2pBandwidthLatencyTest from cuda-samples that they support P2P, which I assume means that they can access the NVMe drive via P2P. For now, I'm trying to get a single V100 to work with a single NVMe drive.

Oct 01 '24 10:10 karlowich

bam bam copied to clipboard

Segfault when running nvm-block-bench

bam
bam copied to clipboard