Jens Axboe
Jens Axboe
I'll check in the morning, it's late here. Fio doesn't do proper batching either, might be a concern. In general, you should not need a thread pool, you can mark...
One thing that is interesting here is that if I run with iodepth=1, then I get about ~7GB/sec of bandwidth from one thread, but when I run with iodepth=128, then...
shmhuge really helps alleviate pressure, but I think what we really need here is the ring sqe/cqe maps being in a huge page... That'll likely be a nice win overall...
Ran the "always copy to the same page" case for QD=128, and it didn't change anything. Puzzled, maybe this is tlb pressure? So I added iomem=shmhuge to use a huge...
I've added kernel support for using a single huge page for the rings, that should cut down on TLB pressure which I think is what is killing us in this...
Can you try with ```iomem=shmhuge``` added to your fio job file? Curious what kind of difference you'd see with it.
Try and add ```norandommap``` to the global section.
You can try and do a: ``` # perf record -ag -- sleep 5 ``` while it's starting up, and then do: ``` # perf report -g --no-children ``` and...
For that last one, click the memset and mutex lock/unlock to get an expanded call trace, that'll give us a better idea of where it's happening.
OK, that makes sense, it's around the iostats setup. I'll try and take a look at this.