bam icon indicating copy to clipboard operation
bam copied to clipboard

Segfault when running nvm-block-bench

Open karlowich opened this issue 4 months ago • 7 comments

Describe the bug I encounter a segfault when running the following command ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=true

gdb shows:

Thread 1 "nvm-block-bench" received signal SIGSEGV, Segmentation fault.
0x00007ffff7fa1b46 in nvm_cq_poll (cq=0x5555556c47d8) at /root/git/bam/include/nvm_queue.h:168
168	    if (!_RB(*NVM_CPL_STATUS(cpl), 0, 0) != !cq->phase)

It looks like the issue originates from linux/device.cpp/nvm_ctrl_init() when allocating ctrl->mm_ptr with mmap. This memory can apparently not be accessed.

To Reproduce

  • Build following instructions in readme
  • Unbind nvme device echo -n "0000:03:00.0" > /sys/bus/pci/drivers/nvme/unbind
  • Load kernel module cd build/module && make load
  • Run benchmark cd build && ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=true

Expected behavior I expected the benchmark to run without segfault

Machine Setup (please complete the following information): I have tried to match the ASPLOS setup

  • Ubuntu 20.04
  • NVIDIA Driver 470.256.02, CUDA 11.6, NVIDIA V100 16GB
  • Intel Optane 16gb (mempek1w016ga)

Additional context I'm trying to reproduce the benchmarks from ASPLOS, so I'm on the asplos tag and using the same software versions. I have compared my build logs to those in asplosaoe/ and they match. I have two V100s in the server and have verified with p2pBandwidthLatencyTest from cuda-samples that they support P2P, which I assume means that they can access the NVMe drive via P2P. For now, I'm trying to get a single V100 to work with a single NVMe drive.

karlowich avatar Oct 01 '24 10:10 karlowich